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Single Error-Correcting Codes for Nonbinary 
Balanced Channels* 


C. W. HELSTROM{ 


Summary—Close-packed, single error-correcting codes are 
studied, the letters of which are N-tuples of M-ary digits, where VW 
is the power of a prime. The length N of the letters must be given 
by N = (M* — 1)/(M — 1), where k is an integer. A balanced 
communication channel, for which all errors in a transmitted digit 
are equally likely, is defined and a physical model given. The 
probability of correct reception of the code letters and the rate with 
which they transmit information in a balanced channel are calcu- 
lated. This involves deriving formulas for the numbers of code letters 
having various numbers of 0’s. Numerical results are given for a 
quaternary code and are extended to the case where the quaternary 
channel has a null zone, so that erasures as well as errors may 
occur. For the type of signals and noise assumed, the balanced 
channel without a null zone is found tc yield the better performance. 


I. DESCRIPTION OF THE CODES 


HE LETTERS of the code alphabets to be con- 
“Petes: here are sets of N digits, each of which 
can take on any of M values. For the most part 
we shall assume that M is an integral power of a prime 
number p and that MZ > 2. We shall restrict ourselves 
to close-packed codes that correct all single errors in sets 
of N received digits, an error being defined as any altera- 
tion of a transmitted digit. The length of each code letter 
is given by 
M'—1 
a aS () 
where k is any integer greater than 1, and the alphabet 
contains M*™“ letters. The term “close-packed” means 
that any of the M™ possible sets of N digits either is a 
code letter or differs from a code letter in only one place. 
We shall describe some of the properties of these codes 
and show how to evaluate their performance in a particu- 
larly simple type of communication channel. 

The existence of these codes for M (the power of a prime) 
was demonstrated by Zaremba,’ who used the methods 
of group theory. Such codes for M prime have been 
studied by Ulrich’ and Shapiro and Slotnick.® An example 
of this kind of code for M = 4, k = 2 was given by Golay.* 


* Received by the PGIT, December 1, 1959; revised manuscript 
received, August 17, 1960. Westinghouse Res. Labs., Pittsburgh, 
Pa., Scientific Paper 412FF471-P1. 

+ Dept. of Math., Westinghouse Res. Labs., Pittsburgh, Pa. 

18. K. Zaremba, ‘‘Covering problems concerning Abelian 

groups,” J. London Math. Soc., vol. 27, pp. 242-246; April, 1952. 
2W. Ulrich, ‘“‘Non-binary error correction codes,” Bell Sys. 

Tech. J., vol. 36, pp. 13841-1388; November, 1957. 

3H.S. Shapiro and D. L. Slotnick, ‘““On the mathematical theory 
of error-correcting codes,” JBM J. Res. Dev., vol. 3, pp. 25-34; 
January, 1959. 

4M. J. E. Golay, “Notes on the penny-weighing problem, loss- 
less symbol coding with nonprimes, etc.’”? IRE Trans. on INFoR- 
MATION THEORY, vol. IT-4, pp. 103-109; September, 1958. 


Cocke’ has shown how close-packed, single error-cor- 
recting codes can be constructed by associating the several 
digits with the elements of a Galois field GF(M) of order 
M, where M is any power of a prime number, M = 7’. 
Under the operation of addition, the elements of the field 
form an Abelian group of type® (p, p, --- , p) (r times). 
That is, there are 7 elements such that the M = p’ field 
elements can be generated by adding them to each other 
in all possible ways. Each of these 7 basis elements is of 
order p under addition. Here ‘‘1”’ is the identity element 
under field multiplication. The multiplicative group is also 
Abelian, but cyclic and of order p” — 1. The code letters 
are N-tuples of elements of the Galois field. One code letter 
will be the N-tuple J = (0, 0, --- , 0) consisting of all 
0’s, where 0 is the identity element under the field opera- 
tion of addition. All the rest of the code letters must have 
at least three nonzero digits in order for single errors to 
be uniquely correctable.° 

The digits y; of each code letter satisfy k linear relations 
of the form 


N 
Dy Gy; — 0, t= 1,2, --*,k; (2) 
i=1 
where the a;; are field elements, and the additions and 
multiplications are carried out by the rules for the Galois 
field. These equations are analogous to the parity-check 
relations for binary codes. Cocke’ has shown that if k is 
linearly independent such relations can always be found, 
and that the resulting code will correct all single errors. 
As an example, we refer to the code listed by Golay* 
in (10) of his paper. The field of order 4 can be described 
by the relations’ 


ta+2z2=0, aa’ =1, 


lta+a =0, 


where x is any of the four elements 0, 1, a, and a”. In 
Golay’s code letters we replace the digits “2” and “3” 
by a and a” respectively. Then the digits of those letters 
satisfy the linear relations 


Gi Ve. te va ease 
at, + t% +a°’rx, +2, = 0. 


(3) 


* J. Cocke, “Lossless symbol coding with nonprimes,”’ IRE 
Trans. ON InrorMaTION Tuuory, vol. IT-5, pp. 33-34; March, 
1959. 

° A. Speiser, “Die Theorie der Gruppen von endlicher Ordnung,”’ 
Dover Publications, Inc., New York, N. Y.; 1945. See ch. 3, pp. 
46-64. 
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quaternary code equivalent to this code was inde- 
ndently discovered by Scherer.’ We shall evaluate its 
rformance in a later section of this paper. 

The letters of these codes are themselves elements of 
| Abelian group of order M*“, which in turn is a sub- 
oup of the group of order M™ containing all possible 
-tuples of M-ary digits. The group operation is digit- 
-digit (vectorial) field addition. Following Slepian® we 


n divide the latter group into M“ cosets by “factoring” ’ 


it the subgroup of code letters. We take as the “leader” 
each coset the N-tuple with the greatest number of 
s; these coset leaders are just the (MM — 1)N N-tuples 
ith (VY — 1) 0’s and with one of the (M — 1) nonzero 
Id elements in the remaining place, along with the 
entity element J = (0, 0, --- 0) of the code group. 
Yo two of these N-tuples can occur in the same coset, 
r then their difference would be a code letter different 
om J and having fewer than three nonzero elements, 
1 impossibility if all single errors are to be uniquely 
rrectable.) We can then write out the code letters in 
horizontal line, and under each letter put its sums with 
e (M — 1)N coset leaders other than J. Since for NV 
ven by (1) this procedure exhausts the set of 1” pos- 
le received N-tuples, the code is close-packed. Each 
ceived set is found in one and only one column, and 
is either the head of that column or differs from it in 
ily one place. Each such N-tuple is to be decoded into 
e letter at the head of the column in which it is found. 
Given a received set (%, ws, ++: Uy), one can form a 


tuple (v1, Dees ee Vx), where 
N 

Vie a A; ;U;- (4) 
aS 


is known as a “corrector’’.” All elements of a coset 


ve the same corrector. A received N-tuple can be de- 
ded by calculating its corrector, looking up in a table 
e associated coset leader, and adding the inverse (or 
egative’’) of the coset leader to the N-tuple. When M 
prime,’ the matrix a;; can be so chosen that the corrector 
ecifies immediately which digit is to be changed and 
‘how much. 


II. CALCULATION OF CopDE PERFORMANCE 
IN A BALANCED CHANNEL — 


A balanced channel is defined in terms of its con- 
‘ional probabilities p;(j) of receiving the digit 7 when 
e digit 7 is transmitted. The probability 6 = p,(z) of 
rrect reception is the same for all digits, and the proba- 
ities 6 = p;(j) are equal for all 7 ¥ 7 and are indepen- 
nt of z. Errors in successive digits are assumed to be 
itistically independent. A physical model of such a 
annel will be described later. 


7R. Filipowsky, P. Portmann, and E. Scherer, “Improvements 
tained by a Quaternary Code and Decision System,” presented 
the Third AeroCom Sym., Rome-Utica, N. Y.; November 8, 
7. 
8}. Slepian, ‘‘A class of binary signaling alphabets,” Bell Sys. 
h. J., vol. 35, pp. 2038-234; January, 1956. 


In the balanced channel the probability px(V) of 
receiving the N-tuple Y when the code letter X was 
transmitted has the following useful symmetry property: 


OOO = 2.0) (5) 


where J is the N-tuple of all 0’s, and Y — X = Y + (—X) 
where (—X) is the element inverse to X under the opera- 
tion of group addition X + (—X) = I. All code letters 
are transmitted with equal relative frequencies. For such 
a channel the decoding procedure described above always 
picks the code letter with largest posterior probability. 
To evaluate the performance of our codes in this kind 
of channel we can use Slepian’s rules’ for computing the 
probability Q of correct reception of a transmitted letter 
and the rate R of transmission of information. Although 
Slepian derived them for the binary symmetric channel, 
they depend only on the symmetry (5) and on the group 
properties of the code, and they are therefore applicable 
here. 

To apply these rules one takes the table of M~” received 
N-tuples, described above, and assigns to each member 
of it the probability 6” 6” ” that it is received when the 
letter J = (0, 0, --- 0) is sent, where m is the number 
of 0’s in the N-tuple. One then forms the column sum 
Qx by adding these probabilities for all N-tuples in the 
column headed by the code letter X, including that for 
X itself. The probability Q of correct reception is given 
by the sum Q, for the first column, and it is easily seen 
to be 


Q= 0; = 67 4+ Ul — Ne? (6) 


The rate R of transmission 1s 


R= = ((N — k) log M + Y Qx log Qx] 


bits per digit, where we use logarithms to the base 2. In (7) 
the sum is taken over all the code letters X. 

This sum is not as formidable as it appears, for many 
of the Qy’s are equal. Indeed, we have 


(8) 


where m = m(X) is the number of 0’s in the code letter X. 
If we let v, be the number of code letters with m 0’s, 
the rate of transmission is 


Ox = Wm(X) 


1 “ | 
R= x, | (N — Bb log M+ Yo ratn log t™|> (9) 
m=0 


We have vy = 1, ty = Q, of (6). 

To compute Qx we must first count up, in the column 
under the letter X, the number of V-tuples having various 
numbers of 0’s. We recall that they are formed by adding 
the coset leaders to the letter X. We can form sets having 
one fewer 0 than X by adding any of the (WZ — 1) non- 
zero field elements to any of the m places of X occupied 
by 0. There are m(M — 1) ways this can be done. The 
sets having one more zero than X are formed by adding 
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to each nonzero element x of X its inverse (—2) under 
addition. This can be done in (VN — m) ways. Sets with 
the same number of 0’s as XY can be formed by adding 
to the (V — m) nonzero elements x of X any of the 
(lJ — 2) field elements differing from both (—.2) and 0. 
Including X there are [(V — m)(M — 2) + 1] sets with 
m 0’s in the column. Thus we get the column sum 
Ox = Trix) = m(M re Dawe fo 
+ [(M — 2)(N — m) + 118” 6°” 
Ag (N aS m)p”"*! Nie m e 


m(X). 


(10) 


ike = 


This way of calculating the column sum Q,y shows that 
it depends only on the number m of 0’s in X, and neither 
on the remaining digits of X nor on their arrangement. 


IIL. Tor NuMBERS ?,,. 


It remains only to calculate the number y,, of code 
letters with m 0’s. The »,,’s are important not only for 
calculating the rate of transmission, but also for des- 
cribing the structure of the group of code letters. They 
have been computed by Lloyd® for close-packed binary 
codes, and it is a simple matter to apply his method to 
our M-ary codes. 

We define v,, to be the number of elements having 
m 0’s and differing from some code letter in one digit. 
The total number of N-tuples with m 0’s is 

pale (Var =— py. (11) 
m 
The sets with m 0’s and numbered by v,, are formed from 
code letters with (m — 1), m, and (m + 1) 0’s by adding 
field elements in the proper places, much as we generated 
the elements in a column in calculating Qy. It is easy to 
modify that development to show that 


vy, = (M — 1)(m + Denes 
+ (N — m)(M — 2)v,, + (N — m + 1)p,-1- (12) 


Combining (11) and (12) we get the set of difference 
equations 


(M — 1)(m + 1)ensr + [((N — mM — 2) + 1)p,, 


- 
+ (N — m+ 1)y,-1 = (™ \cas —- yr". (13) 


They can be most easily solved by introducing the gene- 
; aes 
rating function 


N 
Ge) = Sve. 


s=0 


(14) 


98. Lloyd, “Binary block coding,” Bell Sys. Tech. J., vol. 36 


pp. 517-535; March, 1957. 
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Using the difference equations (13) one can then derive 
the following differential equation for G(z): 


/ Se 2 dG 
[M —1— (M — 2)z — 2’] ib 

4+ [V(M — 2) +1+4+ NeGe) =(M@—-—1+4+2)*. (5 
Since we know that vy = 1, we look for a solution that 


behaves like 2” as z approaches infinity. Independently 
of any relation between MW and JN, this solution is 


(2 + M — 1)? 
N(M — 1) +1 


1G = ME 1) Ne) Cre ae 
A = (NW —1) $174, B= (N —D/SsMG 
A+B=N. (16) 
This result agrees with Lloyd’s’ for M = 2, N = 2* — I, 


The total number of code letters is 

M* 
NG Sayer 
which must be a power of M7. Hence the length N of the 


code letters must be given by a formula like (1). If we 
expand G(z) in powers of z we get 


Ge) = 2° + EN(N — 1)\(M — 1)?" * + 00"). 


G(z) = 


GW) =a, 


s=0 


(17) 


(18) 


Therefore vy_; = vy-2 = 0, and all code letters except J 
have at least three nonzero digits. The general term in 
the expansion of G(z) yields the number of code letters 
with m zeros: 


m= a {(")ear — i)" 4 NOE — 1) 


: Bi A ) ae oa 
mai pomiiey cee) (=a) » (9) 
provided one takes the binomial coefficient to equal 0 
when its lower index is negative. In the simplest case 
of k = 2,N = M + 1, we have B = 1, A = M, and 


a (iat he 


— (-1)"""(M — 1)\(M’ — 1 — Mm)). (20) 


The results of this section do not depend on M being 
a power of a prime, and one is led to speculate whethe1 
close-packed, single-error correcting codes exist for com- 
posite integers MW. That a group code of this kind cannot 
exist for M = 6, k = 2, N = 7 becomes apparent when 
one tries to set up a basis for it. An Abelian group of 
order 6 has one and only one element of order 2. If we 
call it a, a@ + a 0. Now consider the digit sets 
(a, 0, 0, 0, 0, x, y) and (0, a, 0, 0, 0, 2, w). Since for these 
to be code letters they must have three nonzero elements 
none of x, y, 2, and w can be 0. Each of these two septuples 
when added to itself must yield the identity letter 7, for 
otherwise there would be code letters other than J witk 
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ver than three nonzero elements. Therefore x, y, 2, 
Biemust be of order 2, and + = y =z = w= a. If 
then add the two septuples together, we get a code 
ter with two nonzero elements, an impossibility. 

if for a general composite integer M7 we consider ele- 
nts of the code group of the form 


N —k k 
(a, 05-0; ion eel aes eh 53), 


1 let @ run through the elements of an Abelian group 
order M, the digits in each place of the k-tuple on the 
ht run through an automorphism of the same Abelian 
yup. One may be able to draw conclusions about the 
stence of close-packed, single-error correcting group 
les for M-ary channels, M composite, by studying the 
tomorphisms of the Abelian groups of order 7, keeping 
mind that at least two of the elements of each k-tuple 
the right must differ from 0, except in the identity 
ter I. 


IV. A Mopet or A BALANCED CHANNEL 


A transmitter can send any of M narrowband signals 
duration T and of the form 


awt 


real part of F’;(t)e Sy ae, 


ere W is 2m times the carrier frequency. The complex 
velopes F’;(t) of the signals are orthogonal, as follows: 
Te 

[ PROF at = By 5:5 (21) 

0 
s is merely a way of specifying that the signals do 
t “overlap.” All signals are received with equal energies, 
t corrupted by white Gaussian noise and with unknown 
‘rier phases. The signals are passed through a set of 
parallel filters, each matched to one of the signals in 
» sense of detection theory,’® and each followed by a 
ear detector. The outputs of the M detectors are 
asured at the end of each transmission interval of 
ration 7’. Suitably normalized, these outputs are de- 
ted by (41, Yo, °°: Yar) and are described by the prob- 
lity density functions”® 


) = gO, y) = ye” ” (signal 7 absent) 
) = gd, y) = ye” **?’I(dy) (signal j present) (22) 
Yin 0; WES RSS UE 


e outputs are statistically independent. Here d = 
2h /N, is defined as the signal-to-noise ratio, where 
che energy of the received signal pulses and N, is the 
tral density of the noise. 

The M outputs (y:, +++ yar) are fed to a “decider,” which 
its the digit corresponding to the largest of them. For 


°C. W. Helstrom, ‘Statistical Theory of Signal Detection,”’ 
gamon Press, Ltd., London, Eng.; 1960. See ch. 5, pp. 129-165. 
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Fig. 1—Probability of correct reception for quaternary code. 


this channel the conditional probabilities are’ 


B= pl) = [ ad,y) dy 
0 


.f gO, Yo) dy2 --° il QO, Yar) AY ar 
0 0 


, | fi ; ; exp [—rd'/2r + I], 


6 = pifj) = 1 — 6)/(M — 1), 


We have calculated the probability Q of correct recep- 
tion and the rate R of transmission for a quaternary 
code*” for which M = 4,k = 2, N = 5. For this code, 


if SU By 0, Ey 30, Vy oS 1K), ty = 18. (24) 


] #1. 


Vy = 


We used the above-described model for the balanced 
channel, with WZ = 4. The results are plotted in Figs. 1 
and 2 as the solid curves marked a = 0. For purposes 
of comparison we have plotted as dashed lines the cor- 
responding results for a system transmitting the 64 sets 
of three quaternary digits without coding. For these the 
probability of correct reception is simply Q = 6°, and 
the rate of transmission is equal to the capacity of the 
balanced channel, as follows: 


Rk = log M + 8 log 6B + (M — 1) 6 log 6 (25) 


1 ©, W. Helstrom, “The Performance of Communication Chan- 
nels with Orthogonal Signals,’ Westinghouse EHlec. Corp. Res. Rept. 
412FF471-R1; August 28, 1959. See also 8. Reiger, “Error rates 
in data transmission,’ Proc. IRE, vol. 46, p. 919; May, 1958. 
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Fig. 2—Rate of transmission for quaternary code. 


bits per digit. The coded digit sets have a higher proba- 
bility of being correctly received, but they convey infor- 
mation at a smaller rate than the uncoded sets. 


V. Tut BALANCED CHANNEL witH NULL ZONE 


The use of null zones to increase channel capacity has 
been discussed by Bloom, et al.,"” for the binary symmetric 
channel. A simple kind of null zone for the /-ary balanced 
channel can be set up by instructing the decider to emit 
an erasure, denoted by ‘‘H,’’ whenever all M detector 
outputs ¥1, °-* Ya, fall below a certain threshold y = a. 
Otherwise the decider emits the digit corresponding to 
the largest output. The transition probabilities for this 
new channel are as follows:"* 


foo) 


Via 


B-=<p. 0) I q(d, y) au( | q(O, x) ar) 


see Be 


d sa j rd /2r + 1)] 


exp | 


ge ; avr +1) ; 


M—-1 


y = p{£) = / q(d, ¥) anf f q(O, x) a) 
= [t — Od) a) |G aa ame 


= p> A= 8 = (Ol = 1), Ata O) 


2 P. J. Bloom, ef al., “Improvement of binary transmission by 
null-zone reception,” Proc. IRE, vol. 45, pp. 963-975; July, 1957. 


Qa, 6) = fv **?*T (oc) de (27 
B 


5 ~ : 13 
is Marcum’s Q-function. 


There exists a value @ of the threshold for which thi 
channel capacity 
a 
ay 


4-6 log 8:1) Gis alo 1ogme 


Cia) = (1 — y) log ( 
(28) 


is a maximum. This maximum capacity C(@) was founc 
to be only slightly larger than C(O), the capacity of ¢ 
channel without a null zone, for the cases M = 2, 4 
studied.” 

Since the single-error correcting codes described above 
also correct double erasures, it might be thought that 
they would yield a better performance with some non- 
vanishing null zone than with a = 0. To test this, we 
have calculated the probability Q of correct reception and 
the rate R of transmission when the quaternary code 
(M = 4,N = 5, k = 2) is applied to such a balanced 
channel with null zone.'* Using a digital computer, these 
were evaluated for a range of values of d and for a = 0 
to 3 in steps of 0.5. The results are plotted in Figs. 1 and 2. 
(The curves for a = 0.5, 1.0, 1.5 he between those for 
a = 0 anda = 2.) It was found that both Q and R de- 
crease as the threshold a increases, slowly at first, and 
then more rapidly as a exceeds d. Thus for M = 4, with 
the type of signals and noise considered here, it is preferable 
not to use a null zone when applying this kind of code, 
even though it does correct double erasures. 

In our analysis’* we postulated a decoding procedure 
that chooses the code letter with the greatest posterior 
probability in view of the digits emitted by the decider. 
For received sets with no erasures it is the same as that 
described above. For sets with two erasures the missing 
digits are calculated from the linear relations (4) for the 
code (k = 2). 

If a set contains a single erasure, the remaining four 
digits may or may not be part of some code letter. If 
they are, that letter has maximum posterior probability 
and is emitted by the decoder. If they are not, there 
are four code letters with equal and largest posterior 
probabilities. We assumed that the decoder picks one of 
these four with probability 1/4. Thus a set with one erasure 
can be decoded by filling in the erasure with a quaternary 
digit picked at random, after which the decoding pro- 
cedure for sets with no erasures is applied. If three or 
more erasures occur, all but two are filled in with quater- 
nary digits chosen independently and at random, and the 
remaining two digits are calculated from the linear check 
relations (4). 


8 J. I. Marcum, “A Table of Q-Functions,”’ Rand Corp., Rept. 
RM-339; January 1, 1950. 
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Under this decoding procedure the received sets retain 
ne symmetry property (5) required for the validity of 
lepian’s rules for calculating Q and R. It is then a matter 
i listing all possible received N-tuples, including those 
ith one to five erasures, underneath the code letters 
ito which they are decoded. To each is assigned the 
robability B"y"5"-”-” of being received when the letter 
= (0, 0, --- 0) is sent, where m is the number of 0’s 
nd » the number of E’s in the N-tuple. The column 
ims Qx are then formed as before, but with one modifi- 
ation. For those V-tuples involving erasures and decoded 
y a chance device, the probability B”y"5*~-""” is weighted 
‘ith the probability that the letter X is picked by the 
ecoder. These weights are 1/4 for n = 1 and 3, 1/16 
or n = 4, and 1/64 for n = 5. Since the sum Qy again 
epends only on the number of 0’s in the code letter X, 
ne can use (9) for the rate of transmission. 
~The tedious task of counting up ail the terms is de- 
sribed elsewhere.’* We only list the column sums for 
he case M = 4,N = 5: 


5 = B + 156°5 + 5B*y + 608°75/4 
Os ye 108 'y. /P 587" /16 Pp °/64. 
40. W. Helstrom, “The Performance of the Scherer Code in 


Quaternary Symmetric Channel,’”’ Westinghouse Elec. Corp. Res. 
ept. 412FF471-R2; September 1, 1959. 
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rT, = 36°0 + 76°8 + 685 + 7(28d + 36'S) 
+ (68°65 + 186°" + 3085° + 68°)y/4 
+ (38°02 680 SP oy 
+ (6° + 685 + 38°)y'/4 
+ (28 + 38)7'/16 + 7°/64. 

468°5° + 966° + 36° + (466° + 6)y 
+ (126°8° + 2868 + 206*)y/4 
+ (688° + 46°)y* + (486 + 66)7"/4 
+ (8 + 46)7°/16 + 7°/64. 

566° + 116° + 5yé + (2086 + 406')y/4 

+ 10775? + 108°y°/4 + 567/16 + 77/64. 


The probability of correct reception is Q = 7s, and the 
rate F is calculated from (9). 


I 


Ty 


Co = 


(29) 
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Linear-Recurrent Binary Error-Correcting Codes 
for Memoryless Channels" 
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Summary—tThis paper concerns the analysis of recurrent-type, 
arity-check, error-correcting codes for memoryless, binary sym- 
etric channels. These codes are defined to consist of message 
>quences augmented by insertions of r successive parity digits 
very ) successive message digits. An analysis framework is 
stablished for the codes which consists mainly of a parity check 
\atrix [M] and a message difference vector [N]. Within this frame- 
ork, a decoding scheme is developed which renders the codes 
upable of correcting any set of <e errors in m/b successive (b + 7)- 
igit blocks of coded message sequence, where e is maximized 
yer all parity-check codes having the same redundancy ratios 
ad maximal lengths of dependence among their digits. An example 
given of a linear-recurrent code which has a lower probability 
‘error than the best comparable block code, and several out- 
anding problems are discussed. 


* Received by the PGIT, February 15, 1960. This work was 
ipported by Air Force Rome under contract No. AF 30(602)-1915. 
+ Elec. Engrg. Dept., Montana State College, Bozeman. 


INTRODUCTION 


OST of the results to date in the theory of error- 

correcting codes for memoryless, binary sym- 

metric channels have concerned block codes. 
These codes have almost always been of the systematic 
type, where n-digit code words are divided up into k 
information digits and (n — k) parity check digits. Such 
codes have the advantage that they can be simple to 
instrument, they can be designed to have a good amount 
of error-correcting ability, they possess enough mathe- 
matical structure to render them amenable to analysis, 
and their independence between blocks gives them a 
decoding stability that is not easily obtainable with 
“recurrent-type’’ codes whose correct decoding at one 
time tends to depend upon correct decoding at all previous 
times. 
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Nevertheless, it is known that under certain circum- 
stances essentially nonblock codes can also be designed 
to have easily instrumentable and peculiarly efficient 
error-correcting properties.’ The purpose of this paper 
is to provide a first look into these properties for the 
memoryless binary symmetric channel case. This is done 
by first establishing a general analytical framework for 
a regular, parity-check type of recurrent code, and then 
exhibiting an example of one such code which has a 
greater error-correcting capability than the corresponding 
best block code.* 


THe GENERALIZED CoDER AND CHANNEL 


The generalized coding scheme that is considered in 
this paper is illustrated in Fig. 1. There a random sequence 
of binary message digits is fed into an m-place shift 
register R in blocks of 6 digits per block. This causes 
corresponding blocks of b digits to be shifted out of the 
right end of R into the channel for transmission. After 
each new b-digit block is located in R, r successive parity 
check digits are formed in the parity check circuit P, and 
they in turn are moved out into the channel for transmis- 
sion. The parity digits have values such that their modulo 2 
sums with the corresponding message digits are 0. Thus 
the output of the coder consists of alternations of r suc- 
cessive parity digits from P, and b successive message 
digits from R. The code is called a linear-recurrent code. 

The channel is memoryless, symmetric, and binary. 
Thus it is completely specified by stating that the prob- 
ability of a coded message digit’s being put in error is Pe, 
where 0 < Pe < 1/2. 


THE GENERALIZED DECODER 


Hereafter, let us assume synchronous circuitry every- 
where. With this understanding, the generalized decoder 
is given in block diagram form in Fig. 2. The system there 
is interested in decoding and correcting only 6 of the 
possibly-corrupted received message digits at a time. The 
idea behind this, roughly speaking, is to provide a decoder 
which is not forced to cut off from consideration any of 
the parity information that is available on any received 
message digit while it is decoding it. It is also to provide 
a coding system for which it is not necessary to confine 
all the parity information on each block of m, x,-digits 
to just m(b + r)/b consecutive coded message digits as 
in the corresponding block-code case. 

Now let us make this intuitive idea precise. Suppose 
in Fig. 1 that a message sequence 


AE st 
aM — 40 Ord MESSE ies ies eee i+; Oro oO 


1D. W. Hagelbarger, 
burst-correcting, binary codes,” 
969-984; July, 1959. 

2W. L. Kilmer, Some Results on Best Recurrent-Type Binary 
Error-Correcting Codes,’ ’ 1960 IRE IntTEerNatTionaL CONVENTION 
REcoRD, pt. 4, pp. 135-147. 

3 A.B. Fontaine, and W. W. Peterson, “Group code equivalence 
and optimum Codes,” IRE Trans. on InFrorMaTION THEORY, 
vol. IT-5, pp. 60-70; May, 1959. 


“Recurrent codes: easily mechanized, 
Bell Sys. Tech. J., vol. 38-4, pp. 
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Fig. 1—Generalized coder. 
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Fig. 2—Generalized decoder. 
is put into the coder in the order of increasing subscripts, 
and that a corresponding coded message sequence 
C(X) = -:- Ci+iCk-1+7 °° * 


{k = (b + r/b)m] is put out of the coder. Let the noise 
in the channel between the coder and decoder be repre- 
sented by 


aes 


N -— cue) .e Nes jMe-143 oe y+; BOR Fy 
where n; = 1 if an error occurs in the zth place of C(X), 
and n; = 0 otherwise. Also let the swm of any two binary 


sequences X and Y be denoted X + Y, and define this 
sum to be the sequence 


VE 


ei4eiei-1 °° 6) 


where z; is the modulo 2 sum of x; and y;. Then the received 
message in Fig. 2 can be written as C(X) + N. 

Suppose now that all the noise digits n; up to and 
including some n,, are 0, that at least one of n;.1, +++ , Nise 
is 1, and that n,; is in the last of a string of r parity digits. 
The coded message digits that are affected by n;,, and 
subsequent noise digits move into the decoder, a digit 
at a time, in blocks of 6 + r digits per block. 

At this point, recall that the decoder is interested in 
decoding only b, x,-digits in C(X) + N at a time. Further, 
note that each x, digit can have parity digits in at most 
m/b successive (b + r)-digit blocks dependent upon it 
(this can be seen from Fig. 1), and that these blocks all 
either immediately precede or include the block of the 
x; in question. Then for the case above, the decoder 
proceeds to investigate only the m/b, (6 + r)-digit blocks 
of the received message which end with the digit in the 
C;+, position. It does this by coding up for trial considera- 
tion all the m-digit sequences X,; which could possibly 
have gone into the coder to cause the part of its actual 
output sequence 


x; digits parity digits 
eS, 


i De yay O80 Gronan 00 oC Ce C(X»). (1) 
(m/b)th block of Ist block of 
b + r digits b + r digits 
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bel each of the 2” trial coded sequences, C(X;), 
ts eae 

In Fig. 2 the trial coding is accomplished in T by T’s 
uwing on the contents of D. At each point in time, D 
assumed to contain the m perfectly-corrected x; digits 
ich just precede the x; digits in C(X,). (The impli- 
‘ions of this assumption will be discussed later.) 


Now label the subsequence, 
Ni +b “ee Wig eae) este nN e101. 8 Nn 
Sy aN 8 


(m/b)th block of 1st block of 
b + r noise circuits 6 + r noise digits. 


(2) 


ich is contained in the particular N defined above, N>. 
en the S part of the decoder in Fig. 2 adds each C(X,) 
‘med in TJ to the received sequence C(X,) + No. The 
sult is a set of 2” sequences: 


S; = C(X;) of C(Xo) ae A 
ice the coder and decoder are linear, we have that 
(Xa) + CK) = CX; + Xo), la 


erefore, when X; = Xo, S; = No, and otherwise 
= N, + C(X; + X,). 
At this point let us suppose we have a code such that 
> every 2 for which (X; + X,) contains at least one 1 
one of its first 6 places, 6 < m, C(X; + X 5) contains 
minimum of (2e + 1) 1’s over-all. Suppose also that 
2 decoder selects at each “stage”? of decoding the 
st 6 digits of an X; for which the corresponding 
(X; + X,) + N >] contains a minimum number of 1’s. 
ven over successive stages of decoding, the code in 
estion provides e-error correction within any set of 
./b) consecutive (b + r)-digit blocks of coded message 
rits. It is clear that this provision is on a maximum- 
elihood-detection basis. 
At first it might seem that for any given code, the 
aller 6 is, the fewer X; there are that differ from X, 
one of the first 6 places, and hence the fewer C(X; + Xo) 
ere are to consider in establishing the e in (2e + 1). 
it the fact is that nothing is gained by reducing @ to 
ything less than b. The reason is simply that for 6 < b, 
the parity information that pertains to the first b 
rits of X, is used in the selection of the @ digits. Hence 
ecting only 6, x;-digits at each stage of decoding does 
t allow the introduction of more parity information on 
e remaining x,’s of the block during the next stage of 
coding. So from now on, we will assume that 6 = b.* 


(1 = des; Oey i 


@ = 1,2,-- 


Quasi-Brest CopEs 


A logical question to ask at this point is: how does one 
ect parity relations for P in Fig. 1 so as to maximize é 
a function of m, r, and b? This question may not be 
uivalent to asking which codes provide minimum prob- 
ilities of error (7.e., are ““optimum’’). For consider the 


4 The idea here of decoding by stages is related to that contained 

J. M. Wozencraft, “Sequential Decoding for Reliable Com- 
nication,’ Elec. Engrg. Dept., Mass. Inst. Tech., Cambridge, 
ASS. 


(block-type) group code analog: there it is not always 
true that the optimum codes are those which have the 
maximum e’s,’ neither is it always true that the codes 
which are optimum for small p, are also optimum for 
large p, (the dividing line between small and large p, 
is usually about 0.3).° But these statements are usually 
true. On this basis, it seems justifiable to define quas?- 
best linear-recurrent codes as those which have maximum 
e’s (e either an integer or an integer divided by 2). 

At this point we omit a rather obvious proof, involving 
simple rearrangements of parity and message digits, and 
simply state that a quasi-best linear-recurrent code is also 
one which can correct any set of e errors in (b + r)m/b 
successive coded information digits, where e 1s now maximum 
over all equally-redundant parity check codes whose parity 
digits are not dependent upon any more than m successive 
message digits. It 1s important to note in this context 
that if for some code m = b, and P is not fixed but varies 
periodically with period 7, the code in question is simply 
a usual block type of parity check code. Thus the set 
of all linear-recurrent codes properly contains the set of all 
parity check block codes. 

To proceed with the above definition of quasi-best code 
as our criterion, we now desire codes which meet the 
following specifications: for given m, 7, and b, the minimum 
number of 1’s in C(X;) + C(X,), taken over all X; 
for which (X, + X,) contains at least one 1 in its first b 
places, is an absolute maximum. In other words, for all 
X,; in question, if (X; + X,) contains q l’s, the parity 
digits in C(X,) differ from those in C(X,) in at least 
7 — q places, for maximum possible 7. 

In order to pursue the desired codes in mathematical 
terms, let us define an m-digit parity check vector 

fe GU C20 70h 
as a representation of the jth parity check relation in 
P of Fig. 1 as follows: if the jth parity digit being formed 
is dependent upon the digit in the 7th place of R, pi = 1; 
otherwise p; = 0. The significance of the 7 here is that 
in the general case the parity relation in P varies over 
some period. 

Next let us formulate an expression which, for any 
particular code as defined by a sequence of p’’s, exhibits 
the places in which the parity digits of a C(X;) differ 
from those in C(X,). To do this, define C*(X; + X,) = C* 
to be C(X; + X >) with everything but parity-position 
digits deleted; and define N¥ = (X,; + X,). Then the 
1’s in C% indicate just where the parity digits in C(X,) 
differ from those in C(X,). 

The first digit in C* is given by the modulo 2 value of 
the ordinary matrix product, . 

Nit, 


k+1, k+1 
[ 


Pm Dm =-1 (3) 


SD ol 


5 D. Slepian, “A class of binary signaling alphabets,’’ Bell Sys. 
Tech. J., vol. 35, pp. 203-234; Sec. 1.10; January, 1956. 
6 Fontaine and Peterson, op. cit., pp. 67-68. 
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where p‘*' is the parity relation used to establish the 
first c; in (1), and nx, --- , n* are the Ist to the bth 


digits respectively in V*. The second digit in C* is given 
similarly by the modulo 2 value of 


FS 


Continuing in this manner, the first r digits in C*% are 
given by the modulo 2 values, from bottom to top in 
the order of their occurrence, of the numbers in the column 
sequence equal to 


k+2 k+2 


[Pn Pm-1 (4) 


k+2 
Sry ares 


k+r 


[5 k+r * 
|e sot Dregeualleigs 


(5) 


pi? 
Finally, since the last digit in C* is the last parity digit 
in the (m/b)th block of digits in (1), all the digits in C* 
are given by the modulo 2 values, from bottom to top 
in the order of their occurrence, of the numbers in the 


k+1 * 
vik DPin—b+1 l Nis 
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in (6) and (7) are 0. Note that in (6), the modulo 2 product 
of the bottom row of [M] and [N**] is just (4) above 
the product of the bottom row of (7) [7.e., the bottom 7 
rows of [/] in (6)] and [N¥*] is just (5) above; etc. Alsc 
note that the validity of (3) to (8) depends on the assump: 
tion that the previously decoded (m/b — 1) blocks contair 
no error. 

Now a most important feature of (6), which is just 
the upside-down transpose of C%, is that it is equal te 
the sum of those columns in [M] for which there corre- 
spond 1’s in the same rows of [N**], counting from the 
bottom up. This is because postmultiplying [7] by a 
column vector amounts to forming the linear combination 
of its columns specified by the postmultiplier. Thus the 
nature of C* and hence the value of e is essentially estab- 
lished by the columnar sum properties of [J]. This fact 
is basic to the study of linear recurrent codes. 

Now we are in a position to discuss exactly what it 
is that we desire: For given m, r, and b, we want to select 
a periodic sequence p’, p, -*: p’, with z finite and such 
that for k in (6) equal to 1, 2, --- , or 2; (kK + 1) wn (6) 
equal to 2, 3, --- , 2, or 1, respectively, etc; the sum of any 


column sequence equal to subset of q columns of [M] contains at least GQ — p) Is, 
k+mr/b ktmr/b | 1 k+mr/b k+mr/b ae 7 
Pm cote m—b+1 ! | Po cae Pi _ One 
| | | < 
| | I 
! 
k+(m—-1/b)r+1 k+(m-1/b)r4+1 u ! k+(m/b-1)r#1 k+(m/b—-1)r+1 
on ween mabe | | ! b a Pi i ‘ Nee 
* 
Ny, ) (6) 
k+2 k+2r k+2r k+2 é 
m Drab [Derik [Oj | ‘ 
| 
| ne uu 
| k+r+1 k+r+1 k+r+1 k+r+1 
| m : DPm-b+1 Dire ies Pm—2b+1 
| | k+r k+r 
Dee ae, Dip 
} ! C 
} 
k+1 k+l 
| | Din : [Oris al 


[M,., M2 M, »m/b 
M22 Ms mi 
= [ela G2) 
Mesa Me ae 
Le Myo, m/b 
= [M)INe*] = [cr] (8) 


where [C¥*] and [N**] are just the transposes of the 


C* and N* sequences, respectively, in reverse order from 
left to right, and all the elements below the solid lines 


for maximum possible 7. In case it is required that the 
parity check vector be fixed at p, it is sufficient to select 
only the top row of [M] to completely specify it. Having 
chosen the required list of p’s, the complete logical specifi- 
cation of the coder in Fig. 1 follows immediately from 
the definition of a parity check vector. 


RELATION TO BLocK Coping THEORY 


At this point there remains the problem of specifying 
a procedure for constructing p’s according to the criterion 
just given. It is interesting to note that this problem is 
almost identical to a corresponding one, given by Sacks,’ 


7G. E. Sacks, “Multiple error correction by means of parit: 
checks,” IRE Trans. on Inrormarion Turory, vol. IT-t, ee 
145-147; December, 1958. 
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’ constructing e-error-correcting block-type group codes. 
cks’ problem, which is still unsolved in any satisfactory 
ase, 18 to choose (n’— r) binary sequences (the code 
aracteristics) to serve as the top (n — r) rows in the 
by r matrix such that no subset of < 2e of the rows in 


columns 
1 ) agi r 
1 
2 cae 
rows consisting of the 
(n — r) code character- 
: istics (9) 
rows 
n—-rt+1 : ; 
r by r unit submatrix 
oe 


e linearly dependent. Because of the unit submatrix in 
), this problem is equivalent to the one of choosing 
_— r) rows such that no sum of any q of these rows 
ntains fewer than (2e + 1 — q) 1’s in it. Clearly, this 
oblem is almost the “transpose” of the general one 
at is stated above for linear-recurrent codes (in terms 
the columnar sum properties of the [/] matrices for 
ese codes). The only differences are that: 1) in the case 
linear-recurrent codes all the below-diagonal elements 
[M] have to be 0, whereas in the upper submatrix of 
), full-length sequences are allowed; and 2) for linear- 
current codes at least one of the rightmost 6 columns 
{M] has to be included in every one of [J/]’s relevant 
lumn sums, whereas no such restriction is present in 
e group code problem. 


EXAMPLE OF A Brest LINEAR-RECURRENT CoDE 


An example will now be given of a quasi-best linear- 
current code which is actually best in the following 
nse: over a memoryless, binary symmetric channel this 
de provides the lowest probability of error that any 
ear (parity check) code whose interdigit dependence 
tends over a maximum of 12 digits possibly can. This 
mains true as long as no errors are made in the decoding 
eration. 

The code is defined by the parity check vector p = 
1001, so the [M/] for the code is 


tet OvOn1 | 
linea Lesh 
Puts Ou) 
tie 
iyi 


L Del 


This matrix has the property that every sum of qg of 
- columns that includes its rightmost column contains 
least (5 — g) = (2e + 1 — q) I’s. Hence, according 
the decoding scheme given above, the code has the 


ability to correct all single and double errors which can 
occur in any set of (m/b) = 6 consecutive blocks of coded 
message digits, where there are (b + r) = 2 digits per 
block. This compares favorably with the best (12, 6) 
block code* which can only correct all single errors, 50 
of the 66 possible double errors, and 1 of the 286 possible 
triple errors that can occur in any block of 12 digits’. 

The decoder in this example decides on b = 1 message 
digits per stage, where 2° = 64, 6-digit X,’s are coded 
up for trial consideration at each stage. It is probably no 
surprise that the 64 X,’s here do not imply an excessive 
amount of equipment in the decoder. For the 7 circuit 
can consist essentially of: 1) a 6-digit linear feedback 
shift register capable of generating a (2° — 1)-length 
binary sequence,’ which sequence automatically contains 
every 6-digit binary sequence as a consecutive-digit sub- 
sequence; and 2) a single coder of the type shown in Fig. 1 
to code up the output of the shift register. S in Fig. 2 
can consist simply of a 12-digit sequence comparator (2.e., 
a 2-digit modulo 2 adder and a count-to-3 counter), and 
a storage register for storing at any time ¢ the best trial 
X,; found from the beginning of the decoding stage occur- 
ring at ¢ until ¢. 


SoME OUTSTANDING PROBLEMS 


Up until this point it has been assumed that mistakes 
never occur in the decoding process. This assumption has 
enabled us to avail the following idea, which essentially 
restates what has gone before, but from a slightly different 
point of view: At each stage of decoding, the block of b 
message digits being decoded has first of all r parity 
digits generated which are (effectively) dependent solely 
upon it; then the block being decoded has r more parity 
digits generated which are (effectively) dependent only 
upon it and the succeeding block of message digits, where 
the succeeding block is chosen so as to provide a best 
match to the received coded message, and so on, until 
finally the block being decoded has an (m/b)th set of r 
parity digits generated which are (effectively) dependent 
upon it and the (m/b — 1) succeeding blocks of message 
digits, where each of these succeeding blocks is chosen 
so as to provide a best match to the received coded mes- 
sage. Thus, under the assumption that no mistakes ever 
occur in the decoding process, this process amounts, 
essentially, to the successive generation and subsequent 
deciphering of an aggregate of redundacy patterns (the 
blocks of r parity digits), where this aggregate is always 
“weighted” most heavily toward that end of the message 
which is currently being decoded. 

With this scheme of decoding, if the results previous to 
any stage leave incorrect digits in D, some very dis- 
rupting things can happen. In order to mathematize them, 
suppose that at some stage of decoding the mistakes in D 


2 8 poe 6 message digits and 6 parity digits per 12-digit 
ock, 

9 Fontaine and Peterson, op. cit., p. 68. 

10 B. Elspas, “The theory of autonomous linear sequential 
networks,” IRE Trans. on Crrcuir THrory, vol. CT-6, pp. 45-60; 
March, 1959. 
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of Fig. 2 are indicated by the column vector with a quasi-best code, in all probability the future de- 
‘aa coding operations will be hopelessly garbled, but the 
| dn UE wae probability is not 0 that the decoder will eventually get 
Bete ar PA (10) back on the right track."* If garbling does occur, it 1s 
| : | highly probable that no trial C(X;)’s will perfectly match 
a the received (C(X,) + No)’s for a very long time, even 
where d; = 1 if the digit in the zth place of D is incorrect, though many or most of these N,’s naa, be 0. The ong 
and d; = 0 otherwise. Then instead of (8), we have, bright aspect here is that for low-noise channels, this 
after the manner of (8), fact might serve as a kind of injinite-degree error-detection 
; criterion! 
[CP] = [MI[IN**] + [M’][D*], (11) Another outstanding problem is: how are good, quasi- 
where best, or best codes constructed for general m, 7, and b? 
| | 
| 
| | 
= ee ee | ee ere es 
k+(m/b-1)r k+(m/b—-1)r ! | 
b 1 | | 
| 
| 
k+(m/b-2)r4+1 k+(m/b-2)r+1 | 
; 5 Di \ | 
erent See ee See Pa ene ee 
| 
[M’] = : 
aoa SS 2S SSS aS =, SS Se) Se eS eS ee Dh one aoe oe 
fA, : De arat ji ia | 
| 
| | 
ator et jie Sie +r+ | 
HA Desay es es. : Pi : 
~ a 7 ee 7 *% me me ay | = ae "i = at as Sw 
kt+r k+r k+r kt+r k+r k+r 
m—b Pm-2b+1 | Pop b+1 Po “pi 
| . 
k+1 k+1 ' I k+l k+l k+1 k+l 
Pm-> SD mob | 2b b+1 jis FO th 


with all the elements above the solid line 0, and [D*] 
is as defined in (10). The validity of (11) can be seen 
by recalling that when the first r parity digits are formed 
for some trial X; at any stage of decoding, the first b 
digits of that X; have already moved into the leftmost 
b places of T’s coder register; and so on, until at the 
end of the X; trial, all m of the X,; digits have moved 
into T’s coder register. 

Now if [D*] does not contain all 0’s, the weighted- 
redundacy idea stated above breaks down almost com- 
pletely. This is reflected in the analysis by the fact that 
in this case the following new criterion has to be substituted 
into the analysis scheme: The [C**]’s, for all z for which 
(X; + X,) contains at least one 1 in the first 6 places, 
must all differ from each other in at least (2e + 1 — gq) 
places regardless of what [D*] happens to be. This must 
be so if e-error-correction is to be obtained over every 
(m/b) successive (b + r)-digit blocks of the coded mes- 
sage sequence. 

Thus far, this new criterion has seemed especially 
intractable—at least the author has not yet been able 
to do anything very satisfactory with it. The following 
is all that can be reported: If a decoding mistake occurs 


It should be mentioned here that the example of the 
previous section was not obtained in the presence of any 
such generality. Rather, it was one of a set of fixed- 
parity-check codes that were derived for all m < 6 and 
for some m > 6 by exhaustive and trial-and-error methods. 
All that can be said from the results is that apparently 
a tight description of the summability properties (in the 
number-of-l’s sense) of nonperiodic, nonrandom, binary 
sequences is needed before further general progress can 
be made. The author is reminded, in this regard, of the 
work of Golomb,” Zierler,'* and Rothstein,’* but would 
at present make no claims in these directions. 


4 For No could be 0’s in the message-digit positions, and 1’s 
in just those parity-digit positions where corresponding 1’s were 
indicated in [M/’] [D*]; and this could happen enough times in 
succession to allow enough perfect matches between correct trial 
C(X ;)’s and received (C(Xo) + No)’s to reduce [D*] to all 0’s as 
required. 

2S. W. Golomb, “Sequences With Randomness Properties,” 
Glenn L. Martin Co., Baltimore, Md., Final Rept. on Contract 
No. SC-54-33611; advance copy dated June 14, 1955. 

18.N. Zierler, “Several Binary Sequence Generators,’ Lincoln 
Lab., Mass. Inst. Tech., Lexington, Mass., Tech. Rept. No. 95 
September, 12, 1955. : 

“J. Rothstein, “Analysis of binary time series in periodic 
functions,” IRE Trans. on ELectronic Computers, vol. EC-8. 
p. 229; June, 1959. 
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| SUMMARY AND CONCLUSIONS 

In this paper an analysis framework is established for 
lear-recurrent codes meant for use over memoryless, 
nary symmetric channels. Within the context of the 
umework, consisting mainly of the matrices [/] and 
**], a criterion is given for constructing ‘“quasi-best’’ 
des capable of correcting any set of Z e errors in (m/b) 
ecessive (b + r)-digit blocks of a coded message se- 
1ence, for maximum possible e. This criterion is compared 
a similar one given by Sacks for corresponding best 
ock codes. An example is given of a linear-recurrent 
de which has a lower probability of error than the 
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corresponding best block code. Finally, several problems 
are suggested, mainly with a view towards finding more 
efficient codes for practical uses, and a general way to 
approach channel capacity at a faster rate (with respect 
to increasing maximum length of dependence among coded 
message digits) than is possible with ordinary block 
codes. The latter might reasonably be expected, for the 
“weighted-redundacy” and ‘“‘increased-effective-block- 
length” effects that are achieved by linear recurrent 
coding would seem to suggest that as m increases, linear- 
recurrent codes get better and better than their counter- 
part block codes. 


Probability Density Functions for Correlators with 
Noisy Reference Signals” 


G. M. ROEF anp G. M. WHITE}, mMempBer, IRE 


Summary—Recently, correlation functions have had to be con- 
dered where both the reference waveform, which is usually the 
sired signal, and the input waveform are masked by different 
mples of additive noise. In this article, we derive the probability 
snsity function for the random variable 8 where 


B = 2 (As;,. a ING CBSs 4 “he INGE): 


he s;,, and s;, are the signal components, and N;, and N;,, are 
mples of Gaussian noise. 

Exact expressions involving Bessel and Whittaker functions are 
ven for several cases. Asymptotic expressions allow W(8) to be 
otted when these exact expressions cannot be obtained or con- 
niently evaluated. 


INTRODUCTION 


“\ ROSS-CORRELATION techniques of comparing 

two waveforms have been playing an increasingly 
YY important role in communication or radar systems. 
1 typical models one of the waveforms used in the cross- 
wrrelation computation is a clean signal unperturbed by 
gise. This can be considered as the reference waveform. 
he other waveform, which can be called the input wave- 
rm, is usually either Gaussian noise or the same signal, 
erturbed by additive Gaussian noise. In these cases, the 
itput of the correlator is Gaussian distributed and has 
zero or nonzero mean, depending upon the absence or 
resence of the signal. 


* Received by the PGIT, July 1, 1960. 
+ General Electric Res. Lab., Schenectady, N. Y. 


If the functions to be cross-correlated are bandwidth 
limited and the integration is performed for only a finite 
length of time, the integral normally used can be approxi- 
mated by a sum of products." The terms forming these 
products are amplitude samples of the two waveforms 
at the Nyquist intervals. The output of this summing 
circuit, if the reference is again a clean signal, is Gaussian 
distributed, having a variance of ko’, where k is the 
number of taps and o is the standard deviation of the 
noise distribution. If a signal is present, the mean is 
determined by the auto-correlation function of the signal. 
If there is exact alignment between the components, the 
mean of the output distribution is then 


k 
D8 
Sy 
t=1 
where s; is the signal component and & is the number of 
taps. 

In this article, we consider a more complicated situation 
where the reference waveform is no longer a clean signal, 
but is corrupted by additive Gaussian noise. This problem 
originated from a theoretical analysis of an adaptive 
waveform recognizer, and is similar, when there are no 
signal components present, to the problem proposed by 


1P. M. Woodward, ‘Probability and Information Theory With 
Applications to Radar,’ McGraw-Hill Book Co., Inc., New York, 
INGRY@ LOS: 

2C, V. Jakowatz, R. L. Shuey, and G. M. White, “Adaptive 
Waveform Recognition,’ presented at the Symp. on Information 
Theory, London, Eng.; Sept. 1, 1960. 
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D. G. Lampard.* We assume here that both the input and 
reference waveforms are bandlimited functions and that 
the correlation can be performed by taking the sum of 
products. Under these conditions, the output of the cor- 
relator is no longer Gaussian. For special cases, we have 
found exact expressions for the desired density functions 
which involve series of Bessel functions or series of 
Whittaker functions. These series are not particularly 
efficient for purposes of numerical computations, but 
alternative approximations based on a saddle point inte- 
gration are sufficiently accurate for most purposes. 


THE PROBLEM 


In many communication systems, it is desirable to 
compute the integral 
+i 


2WB(r) = ii x(dy(t +p dt (1) 


where x(t) and y(t) are two waveforms. 

If a(t) and y(t) are bandwidth-limited functions where 
the bandwidth extends from —W to +W, then the 
integral given in (1) can be approximated by 


= Sa, (2) 


where X; and Y; are k samples at the Nyquist intervals 
corresponding to T and W. 

A correlator that can perform this cross-correlation 
function is shown in Fig. 1. The taps of the delay line 
are placed at the Nyquist intervals, and there are k of 
them. The inputs a(t) and y(t) each may be noise or signal 
and noise. The noise in input x(t) is usually independent 
of the noise in y(t). Furthermore, since the taps are located 
at Nyquist intervals, the values of the noise samples are 
independent of each other. 


TO SUMMING 
CIRCUIT 


Fig. 1—Correlator. 


All of the special cases of interest may be included in 
the general case where the output of the correlator is 
written as 

k 


SP eee Ree 


1=1 +=] 


pas oh Nig) (BS, ct Nols 9) 


’D, G, Lampard, “The probability distribution for the filtered 
output of a multiplier whose inputs are correlated, stationary, 
Gaussian time-series,’ IRE Trans. on INFORMATION THEORY, 
vol. IT-2, pp. 4-11; March, 1956. 


January 
The N,,, and N,,, are Gaussian iincbiis and 
(Nii; 2) = Os On, (4) 
(Ny Na) = a; Oii, (5) 
and 
(Nive Nj.9) = postu Oat (6) | 


Thus the case where the signals in both channels are 
aligned can be obtained by taking p = 0, 8:2. = Siy 
and A = B = 1; while letting p = 0, and A = 1,B =0 
yields the case of signal plus noise in the reference channel 
and pure noise in the input channel. If the signals are 
not aligned, but the signal components are shifted m 
Nyquist intervals in the input channel, then s;,, = $;,2+m: 

It is convenient to denote the normalized output of 
the correlator by ¢: 


e=4 (7) 


and 
a” = 6,0y. (8) 


The density functions we desire are for the normalized @. 
Furthermore, the B in (3) can be adjusted so that 


k k 
DS Do Siu (9) 


R, the measure of the signal-to-noise ratio, is defined as_ 


he jee 
RSS Ss (10) 
Ce iy O0 i=1 
If o, = o,, then 
2B 
R N, (11) 


where H is the signal energy and N, is the mean noise 
power per unit bandwidth. This definition is the equiva- 
lent to Woodward’s R.’ It is also convenient to define 


oni aad 
UKs 


PE se 2 8s 
Kl =" ps) he) (ee) 


(12) 


(13) 


DERIVATION OF THE Density FUNCTION 


In order to derive the density function W(¢), we start 
with the joint density function W, (a;, y;) for the variables 


X ; A Dies Nas 
AUR - = : oF et = Aix meee 
xz z Ma (14) 
Ve B tai N; y 
yo = t= Se 4 = hs, ty 
Oy Cy Oy 


where we consider an ensemble of correlators of the type 
shown in Fig. 1. The terms as;,, and bs;,, are treated as 
constants of the ensemble, and the noise terms are assumed 
to be Gaussian and n;,, correlated only with n;,,. The 
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int density function is therefore 


2 ies 
(2; , yi) = lee aoe ex al = a 


"(= 0S,)9 = 2p a as) 
(Y; rs bs; ,y) Pie (Yi 7: se pete (15) 


e now change from the pair (v,;, y;) to the pair (2,, ¢,) 


rere ¢; = 2,y;. Since the Jacobian for the transforma- 
mis J = | 1/x, |, the new joint distribution is 
: 1 1 
Xv; a =. | D172 ox Ta ae = aOR 
(Xi, $i) Ona Cee ap) exp Gn a) 
{(@; — as;.2) — 2p(x; — as;,.) 
(;/X; ia bs; ,) = (6; /2, a bs; el. (16) 
he Fourier transform relative to ¢; is 
W(x;, E,) = i W{a:, oe” dd; (17) 


3 1 


744) 


Qitp + F1 — p)] 


— 2x,[as;,.(1 — ipt;) + bs;,,] + a’si,}. (18) 


he characteristic function for ¢; may now be found by 
tegrating this result over all x;: 


ii 
Vee ioe). 


&) 


2i¢,abs; 28:4 — €:[b si, — 2Zoabs;.s;, + eal) 
1 u yu 4 
Sy "1 1 — 2itip HEC — p’) ae 


nce the taps have been spaced at Nyquist intervals, 
ie ¢; are independent and the characteristic function 
the sum ¢ = >~*_, ¢; is the product of the k character- 
tic functions W,(é;) or 


(é) = [1 — 2ove + FC — pp’)? 


[3(a° + b°) — pabr] — céabr 
1 — 2pié + F(1 — p’) 


-exp —o R é (20) 


here 


u 


k k 
= ey ioe 
=i 7 


we substitute 


then 


Wi) = = i Wwe de, 


(21) 
L > ee 
= inn He (1 + } k/ 
R he’ — tec 12h : 
(ORD = ees at apes ex (eose: dz (22) 
where 
L=L@,») =(- Bee 
Prom Me 2 eee Pe 2 9 
“exp 3|¢ — Ro abr (a +b)? ], (23) 
1 —p 2 
c =o [abr(1 + p) — p(@ + 6] 
vA 2 Be 2 
= ABr(l + »”) — i{ = ae | < BCs 


h =o ([4(a@ + b*)(1 + p’) — 2pabr] 


2 2 2 
2p ie + Fa Ep) = SpA Rr antes) 
2 Nor Gi, 


The characteristic function in (20) can also be developed 
as a particular case of already established results. Mid- 
dleton* presents the derivation of the characteristic func- 
tion for the quadratic form 


e= V'JIV (26) 


where J is a symmetric matrix and V a column vector 
of random components. Our variable 6 can be put in 
this form by letting the first k components of VY represent 
the quantities X, and the second k components of V 
represent the quantities Y; and by choosing J of the form 
" | 
E08 


J = 


where J is the identity matrix. For the limited correlations 
permitted by (4)-(6), the covariance matrix for the V; 
has only a single paired set of off-diagonal terms, and the 
required matrix inversions can be written down directly. 


Tur Format EXPRESSION 


Exact expressions for the integral in (22) in terms of 
known functions may be obtained for three cases of 
primary interest, as follows: 


Casen — n: If there is no signal in either channel, then 
A B Cc h 0, and 
Li(¢, p) io CG eer 
Winld) = ie ea dz. (27) 


4D. Middleton, “An Introduction to Statistical Communica- 
tion Theory,” McGraw-Hill Book Co. ‘Inc., New York, N. Y., 
sec. 17.2-1, p. 738; 1960. 
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The integral is a Bessel function of imaginary argument,” 
and we have 


“exp ¢ rae: | p Viana gl Lt. (28) 


This is the same result that Wishart and Bartlett® obtained 
when they considered a similar sum of products. 

This same result can also be obtained from Lampard’s* 
paper if one considers that the impulse response of the 
post multiplier filter is the sum of k delta functions where 
the arguments of these delta functions are zero at the 
Nyquist intervals. Since the noise is bandlimited, the 
correlation functions in (53) of his paper become (with 


sine & = sin 7a/7x) 
¥iild) = o2 sinc 2Wt, 
Wis = pose, SINC Zt, 
Yoo = o, sinc 2Wt. 


If there is signal plus noise in one channel, 
bi) randepr 80 


Case n — s: 
and noise only in the other, (A = 


then 
Li¢, 0) 3S ike 
é = 0, 
o 
(SEs Qa 


If also o, = o,, then 
y 1 Fi 2\ —k/2 
Wild) = 5 f A+2) 


“exp = ine isp} exp + lee dz (29) 

“Pie ig RSID (1 aN He 
Expansion of the second exponential factor yields a series 
of integrals similar to (27), and integration term by term 
gives 


eye 
Sy 


Va 


eel. 


7=0 . k : 
J r(3 4 i 


W,,.(¢) aa 


(k-14+27)/2 
) omer rs Mac) 


> W. Magnus and F. Oberhettinger, ‘Special Functions of 
Mathematical Physics,’ Chelsea Publishing Co., New York, N. Y., 
p. 118; 1949. 

6 J. S. Wishart and M. S. Bartlett, “The distribution of second 
order moment statistics in a normal system,’’ Proc. Cambridge Phil. 
Soc., vol. 28, pp. 455-459; 1932. 


January 
Cases — s: If there is signal plus noise in both channels, 
and the signals are aligned, then (A = B = 1) and 


Siz = 8;, Also, as in case n — s, ¢, = oa, and p = O for 
the conditions 


c= 1, 
h=1, 


and the density function becomes 


W..(¢) = ail (1 a 2’) ES izh EE iz) dz. (31) 


After expansion of the second exponential term, integra- 
tion term by term yields a series of Whittaker functions.’ 
For ¢ > 0, 


1.0) 32 ror 
) (32) 
and for ¢@ < 0, 
W..(@) = te” 


co i (k+i-2)/2 
= Z : (Let) W ~ 572, 45-1 22 | p |). (33) 


An alternate form for these series can be obtained by 
expressing the Whittaker functions in terms of the hyper- 
geometric functions »/5.* 

The density function that Marcum’ derives for com- 
posite pulses of signal-plus-noise minus noise can be 
shown to be special cases of the functions given by (28), 
(32), and (33). 


THe MomMENTS OF THE DISTRIBUTIONS 


The generalized characteristic function of (20) offers 
a convenient way of obtaining the moments since the 
nth moment is the coefficient of the term (z£)"/n! Table I 
gives the expressions for the first three moments, and the 
second and third moment about the mean for the three 
cases given above. 

If the nth moment for the density function W’(8) is 
desired, it can be obtained by multiplying the nth moment 
for W(¢) by o°”. 


7 A. Erdelyi, W. Magnus, F. Oberhettinger, and F. G. Tricomi, 
“Table of Integral Transforms,” McGraw-Hill Book Co., Ine., 
New York, N. Y., vol. 1, p. 119; 1954. 

8 A. Erdelyi, W. Magnus, F. Oberhettinger, and F. G. Tricomi, 
“Higher Transcendental Functions,’ McGraw-Hill Book Co., Inc., 
New York, N. Y., vol. 1, p. 264; 1953. 

°J. I. Marcum, “A Statistical Theory of Target Detection by 
Pulsed Radar,’ RAND Corp., Santa Monica, Calif., Res. Memo. 
RM 753, Math. Appendix, pp. 39-46; 1952. Note added in proof: 
More detailed discussions of series expansions of the frequency for 
the special case k = 1 are given by C. C. Craig, “On the frequency 
function of xy,” Ann. Math. Statistics, vol. 7, pp. 1-15; March, 
1936, and L. A. Aroian, “The probability function of the product 
of two normally distributed variables,’ Ann. Math. Statistics, vol. 
18, pp. 265-271; June, 1947. 
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TABLE I 
Moments For W(¢) 
Case n — n Case n — s Case s — s 
Mean kp 0 R 
Second Moment k(1 + p?) + kp? ie Se IR k+2R + PR? 
Third Moment pk(k + 2)(3 + p(k + 1)} 0 R((6 + 3k) + 6R + R?] 
Variance k(1 +. p?) k+R Ip se 2a 
‘Third Moment About 
the Mean 2kp(3 + p?) 0) 6R 
THe SappLe Pornt APPROXIMATION o 
hope i] 
By using the definitions (12), (13), (24), and (25), Me) 
22) may be written 2 
= { @ 2+ 15 [2hQ + cl + oi (40) 
oe Ee RAGE ERS) 
W@) = 5 | exp |-kP@} az Goa 
‘here er ae Q y{hQ* + Q*) + 260") 
; E 2 In (1 QQ’) aa 1 _ Q° a Gl me Ona ) 
Boi log ee tag ey a ae Lee (Gb) ca) 
G—=(W—Q) + 2701 + 30) 2 2730-0), (42) 
long the imaginary axis, F(z) has poles at 2 = +i, D= 2600 e (43) 
nd has a single stationary point at some imaginary value i ; 
f z between +7 and —7. The saddle point approximation H = (1 — Q’)*(1 + 6Q* + Q*) + 4y(1 — Q”) 
; obtained by shifting the path of integration until it 2 2 4 5 
“(h + 8 10h 10. 5h ; 44 
asses through this stationary point and then taking a UD ne eae fi Cate a) 
‘aylor’s series expansion of F about the stationary point. = (1 — 2){(8Q + Qa — Q) 
Ve define @ as the real root (with Q’ S 1) of F’ (— 7Q) = 0, + 3y(c + 4hQ + 6cQ” + 4hQ* + cQ*)}? (45) 


yhere the prime denotes differentiation with respect to z. 
Ve also let 


Sy 
| 


2F’"(—7Q), 


(36) 
= (37) 


nd substitute z = —7Q + 2u/V kD. Then (34) becomes 


L eas 
V ~ =a i kE—u 
(¢) rein” 
4uF"(—iQ) _ 2utF™”(—71Q) ; | 
exp { 3 Dx/eD 3 ED? ao du. (38) 


ince the integrand is very small except when w is small, 
he terms which involve the third and higher derivatives 
aay be treated as small correction terms. The integration 
ver u is carried out by first expanding the second ex- 
onential in (38) as a power series. The result of this 
aethod of approximation is most conveniently expressed 
y defining both W(¢) and ¢ in terms of the common 
arameter Q. The required set of equations is 


{1 1+ 75, GHG — 10 NG*) + - sas (39) 


Since (40) is a quartic in Q, standard methods could be 
used to write Q as an explicit function of ¢ and y, but 
the numerical work is much simpler if the parametric 
arrangement is used. The density functions for the three 
cases of primary interest may be computed from the 
above formulas with the following choices for the parame- 
ters: 


Wind) -h =e = 0, 
W,,.(@) cal = 25 C= 0, R= 0, 
W,.(9) : h =e Ly Po 0. 


AcCURACY AND LimitTInG Forms 


Fig. 2 shows the values of W,,(¢), W.,(¢), and W,,,(¢) 
as computed from (39) and (40) for the choice k = 11, 

= 0, R = 11. These numerical results were checked 
against the exact expression (28) for W,,(¢) over the 
entire range, and against the series (30) for W,,(¢) at 
two points. In all cases the error was less than 1 per cent. 
For large values of R, the convergence rate of the series 
(82) and (33) is slow enough to make machine computa- 
tion of the series almost a necessity, so no comparison 
checks on the values of W,,(¢) have been made. 

The accuracy of (39) can be increased, if desired, by 
carrying the Taylor series expansion in (38) out to higher 
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Fig. 2—Probability density function W(@). 


derivatives. However, the accuracy of (39) as it stands 
is sufficient for most practical purposes. 

By the Central Limit Theorem, the above density 
functions should approach a Gaussian form for k large. 
In the following discussion, p 0 for all three cases. 
The Gaussian forms obtained by treating Q as small in 
(39)—(45) are 


1 aye 
—o?/2k 
aes 


Winke) > ork (46) 

oak 
W onl Ey 47 
ESCs) ee 
W..(@) obs 1 go PHB) Pkt OR) (48) 


V 2n(k + 2R) 


The Gaussian form is accurate only near the peak of the 
density function. The nature of the initial deviation away 
from the pure Gaussian form is easily found by an expan- 
sion of (40)—(45) in powers of Q. For example, the Gaussian 
approximation for W,.(¢), above, should be multiplied 


by a factor 
vis +. 


fi 


These correction terms form the beginning of an Edge- 
worth series,’° and further terms in the series could be 
constructed from the known moments. Once the distribu- 
tion starts to depart from the Gaussian form, the devia- 
tion can become quite large, and as will be seen below, 
the shape of the tails is more nearly exponential than 
Gaussian. A series of the Edgeworth type does not present 
an easily interpretable picture of the shape of the tails 
of the distribution, because the value of k required to 
keep the fractional error within some specified limit in- 
creases with @. 

This difficulty does not appear in our (39). Note that 
the quantities H/G° and N/G® are bounded for all Q in 


R@—R)*  3R@— R) 
(k + 2R)> (k + 2R/) 


10 H. Cramér, ‘Mathematical Methods of Statistics,’ Princeton 
University Press, Princeton, N. J., sect. 17.7; 1951. 


INFORMATION THEORY 


the range —1 < Q < 1, so that the first order correction 

term in (39) is of order 1/k even for ¢ infinite. The shape- 
of the tails can therefore be determined by letting Q” 
approach unity in (39)—-(45). Neglecting terms, of order 
1/k compared to unity, the three distributions of special 

interest become, for ¢ > Vk, 


January 


y sty 1 \k/2-1 —b+k/2 
Re 1 ete -¢+VéR-3/8R 
3 1 ? ta —~o+2V dR-R+k/2 
ie 
®) © 7aiak AR 


1 as a att Ad Re ‘i 
W.(—¢) = a= ( ak 


—$—R/2+k/2 


(52), 
The contrast between these limiting forms, valid in the” 
tails of the distributions, and the limiting forms (46)—(48), 
which hold near the peak of the distribution, is quite 
sharp. The transition between the two limiting forms is” 
easy to illustrate in the case of W,,,(@), for in that case 
= 0 and (40) becomes a quadratic which is easily solved 
for Q as a function of 7. The variation in W,,,,(¢) depends, - 
for large k, almost entirely on the dominating exponential 
term e “”, and for this case H becomes 


By, = (MAF HEH) + tog (AEA). 9 


For 7 small, E,,,, > 4 n°, corresponding to the Gaussian 
shape in (46), while for 7 large, L,, — n — 4 — 3 log 7, 
in agreement with (49). For the cases ss and sn it is not 
possible to write as simple an explicit expression for H, 
but it is clear from (40) and (41) that H depends on k 
only through the ratios 7 and y. 


“é 


CONCLUSIONS 


In this article we have derived several alternate expres- 
sions for the density function for correlators that have 
their reference signals corrupted by additive noise. The 
series expansions for the density functions are rather 
slowly converging, and except for some special cases are 
not very convenient for numerical computations. The 
asymptotic expressions which result from a saddle point 
integration are best written by defining both the density 
function W(¢) and the variable ¢ as functions of a parame- 
tric variable @. When this parametric form is used our 
results are accurate to terms of order 1/k over the entire 
range of ¢. Explicit expressions developed from the 
parametric formulation are less accurate, but show a 
transition from the expected Gaussian shape near the 
peak of the density function to a more nearly exponential 
shape in the tails of the distribution. 


Summary—In this paper, an analysis is made of a communication 
stem in which the information-bearing signal phase modulates a 
ussian noise carrier. The effect of additive Gaussian noise and 
ear filtering on the first-order statistics of the receiver output noise 
d on the character of the output signal are determined. It is 
own that with regard to determining the distortion of the output 
gnal, the system may be replaced by a single linear filter whose 
put is the modulated signal impressed on a sinusoidal rather than 
a carrier. In this way, conventional FM techniques may be 
ed for the determination of signal distortion. 


I. INTRODUCTION 


HE conventional information-carrying type of signal 
) is the modulated periodic time function. Recently, 
: however, consideration has been given to the use 
{ noise and noise-like signals as carriers of information.’ ° 
foreover, in conventional fading communication chan- 
els, a phase-modulated noise carrier is actually received 
ven if a PM sine wave carrier is transmitted. To counter- 
ct the effect of the fading mechanism, a pilot tone may 
e transmitted. Due to the fading medium, the pilot tone 
received as a narrow-band noise process with an ampli- 
ide and PM the same (assuming nonselective fading) 
3 that impressed upon the PM sine wave carrier. By 
uxing this received pilot tone with the received PM 
arrier, one may presumably remove the noisy PM due 
> the fading mechanism. 

It is the purpose of this paper to investigate the first- 
rder statistics of the noise appearing at the output of 
1ae demodulator for a PM signal employing a noise 
arrier. Specifically, it is desired to determine the effects 
f linear distortion and additive noise on the character 
[ noise and signal at the demodulator output. Since the 
urrier is narrow-band noise, the reference for the phase 
etector must be as near as possible an unmodulated 
splica of the noise carrier. The essentials of the system 
» be analyzed are shown in Fig. 1. As shown, the noise 
urrier is phase modulated by a signal g(t) and then 
istorted by passage through linear filter 2. The resulting 
aveform has Gaussian noise n,(t) added to it (possibly 
ceiver noise) and is then presented to the demodulator. 
he reference signal for the phase detector is shown as 

replica of the original noise carrier, which has been 
istorted by passage through linear filter 1 and perturbed 


* Received by the PGIT, April 15, 1960; revised manuscript 
ceived, August 26, 1960. 

+ Appl. Res. Lab., Sylvania Electronic Systems, a division 
“Sylvania Electric Products, Inc., Waltham, Mass. 

1B. M. Horton, ‘‘Noise modulated distance measuring system,” 
roc. IRE, vol. 47, pp. 821-828; May, 1959. 

2A. A. Kharkevich, “The transmission of signals by modulated 
vise,’ in ‘“Telecommunications,’’ Pergamon Press, Inc., London, 
ngland, pp. 43-47; 1957. 

3R. Price and P. E. Green, “A communication technique for 
ultipath channels,”’ Proc. IRE, vol. 46, pp. 555-569; March, 1958. 
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Demodulation of a Phase-Modulated Noise Carrier* 


PHILLIP BELLO?, associaTE MEMBER, IRE 


by additive noise n,(¢). On the assumption that the noise 
carrier is a stationary Gaussian process, and g(t) is 
deterministic, one may show that the process at the out- 
put of filter 2 is also Gaussian, although nonstationary. 
It then becomes clear that the demodulated output 
(assuming an ideal phase detector) is just the phase 
difference between a stationary and a correlated non- 
stationary narrow-band Gaussian process. Thus, the en- 
semble statistical properties of the phase difference process 
will be a function of time. 


n,(t) 


git) 


Demodulated 
Output 


Distorted 
Phase Reference 


Fig. 1—Functional block diagram of system to be analyzed. 


In the following section, the probability density func- 
tion of the phase detector output will be derived. It may 
readily be seen that if filters 1 and 2 are identical and 
have pass bands much wider than the bandwidths of their 
input signals, then, in the absence of additive noise, the 
output of an ideal phase detector will be just g(t), the 
input PM. If g(t) is zero and the additive noise is absent, 
the phase detector output will contain noise if there is 
any dissimilarity (apart from a gain change) in filters 1 and 
2. Moreover, as will be shown subsequently, even if filters 
1 and 2 are identical and the additive noise is absent, 
phase detector output will contain noise if dg(t)/dt # 0. 
Such noise may properly be called “self”? noise since it 
is present under “no interference’”’ conditions and is due, 
moreover, to the noisy nature of the carrier. It will be 
presumed that the phase detector is conventional, in the 
sense that it is a periodic function of its input (£) with 
a period of 27, 7z.e., that the phase detector cannot distin- 
guish between € = 6 and € = 6 + 2kz, where | 6| <7 
and k = 0, +1, +2, ete. The periodic character of the 
phase detector characteristic may be accounted for auto- 
matically in the calculation of first-order output statistics 
if in the calculation of the density function of &(f), all 
values of &(t) outside the interval (—7, 7) are referred 
to this interval modulo 27. When this is done, all values 
of the phase detector input are confined to the interval 
(—7, 7) and it is only necessary to examine the phase 
detector characteristic within this interval. 

In order for the PM system of Fig. 1 to be usable, it 
is necessary that the noise output of the phase detector 
be small compared with the signal output. It will be 
assumed that the PM is confined to the interval —7 + 
G < ¢< 72 — G, where G > 0 is a guard band inserted 
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because conventional phase detectors are not always linear 
over a full range of 27. The presence of noise and signal 
distortion causes the phase difference process £&(t) to 
have values beyond the interval (—2z + G, m — G). 
However, presuming the modulo 27 representation for 
&(t) has been used, it will still be confined to the interval 
(—1, 7). 

The definition of output signal deserves some separate 
consideration, since the over-all PM system from input 
modulation to phase detector output is both nonlinear 
and time varying. The output signal go(t) will be defined 
as the ensemble average of the phase detector output, 7.e., 


volt) = FEO), (1) 


where F(£) is the phase detector characteristic. It will 
be presumed that the phase detector characteristic 1s 
linear within the interval (-—7 + K < § <a — K), t.e., 


ae Satay ie be Spee (2) 


where K is a positive number less than G. This means 
that when filters 1 and 2 are identical with passbands 
much wider than the bandwidths of their input signals 
and when additive noise is absent, the output of the phase 
detector will be equal to g(t), just as for an ideal phase 
detector. 

It will be subsequently demonstrated that (1) and (2) 
have the useful property of producing an output signal 
which is independent of the additive noises in the direct 
and reference channels. 

A most useful property of (1) and (2) is the result that 
¢o(t) may be computed as the phase response of a linear 
filter (called the equivalent linear filter) to the input 
PM g(t) on a sinusoidal rather than a noise carrier. 

To simplify subsequent analysis, the complex represen- 
tation of real waveforms will be used.*~’ Specifically, the 
concept of a ““complex’”’ envelope for a narrow-band time 
function will be used. A real process S(t) has a representa- 
tion in the form 


for 


S() = Re {e()e"**'}, 


where e(t)e’*°’' has a spectrum confined to positive fre- 
quencies (Re { } is the usual real part notation). When 
S(é) is narrow-banded, e(¢) will be called the ‘“‘complex”’ 
envelope of S(t), since then the magnitude of e(t) may be 
identified with the conventional envelope of S(¢), while 
the angle of e(t) may be identified with the conventional 
phase variation of S(t) about the carrier phase wot. 

In the discussion that follows, it will be possible to 
deal entirely with complex envelopes without reference 
to absolute center frequencies. 


* J. Dugundji, “Envelopes and pre-envelopes of real waveforms,” 
IRE Trans. oN INForRMATION TueEory, vol. IT-5, pp. 53-57; 
March, 1958. 

> R. Arens, ““Complex processes for envelopes of normal noise,”’ 
IRE Trans. on INrormMaTion TueEory, vol. IT-4, pp. 204-207; 
September, 1957. 

6 P. M. Woodward, ‘Probability and Information Theory,” 
McGraw-Hill Book Co., Inc., New York, N. Y.; 1953. 

7D. Gabor, “Theory of communications,’ J. [EE, vol. 93, 
pt. III, pp. 429-457; 1946. 
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Il. Propasruiry Density FUNCTION 
oF PHaskE DEeTrector OUTPUT 


In this section, there will be derived the probability 
density function (PDF) for the output of the phase detec- 
tor of Fig. 1. To facilitate the exposition, Fig. 1 is redrawn 
as Fig. 2 with various pertinent complex envelopes in- 
dicated. 


Z(t) 


Filter 2 


Z(t) =e =n) + Z) 
my 


Fig. 2—Pertinent complex envelopes in system to be analyzed. 


Filter 1 


F(é) 


The complex envelope of the noise carrier is z(t). Note 
that the operation of PM of the noise carrier by ¢$(t) is 
represented in terms of complex envelopes as a multi- 
plication by e’*‘”. The complex envelopes of the impulse 
responses of filters 1 and 2 are given by u,(t) and u,(t), 


respectively. It is readily demonstrated that in the case — 


of a narrow-band filter with a narrow-band input (as — 


assumed here) the output complex envelope is one-half 


the convolution of the input complex envelope with the 
filter impulse response complex envelope. Thus the com- 
plex envelope of filter 1 output Z,(¢) is given by 


Z(t) = gm() & 2d), (3) 


while the complex envelope of filter 2 output Z(t) is 
given by 


Z(t) = 4ua(t) & a(e"*™, (4) 


where the symbol ®) denotes convolution. The phase 
detector reference complex envelope z,(t) is given by 


2,(t) = m(t) + Z(t), (5) 


where 7,(t) is the complex envelope of an additive noise. 
Similarly, the phase detector input 2.(t) is given by 


z(t) = no(t) + Z,(t), (6) 


where 72(t) is the complex envelope of another indepen- 
dent additive noise. The phase difference process £(t) is 


readily seen to be expressible as the angle of z*z,. Thus, the — 


phase detector output is shown in Fig. 2 to be computed 
by a cascade of two devices, one of which extracts the 
angle of z%z. and feeds it to the other, a no-memory non- 
linear device with characteristic F'(£). 

Presuming that the noise carrier is an ergodic Gaussian 
process, the z(t) is a complex-valued Gaussian ergodic 
process.” If it is assumed that the modulation is a deter- 


’ For a definition of the complex valued Gaussian process and 
some of its properties, see J. L. Doob, Stochastic Processes,’’ John 
Wiley and Sons, Inc., New York, N. Y., p. 71; 1953. 
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inistic function of time, then 2(f)e’*‘” is still normally 
stributed, although nonstationary. Since any linear 
eterministic operation on a normal process leaves the 
ormal character unchanged, it is clear that Z,(t) and 
o(t) [and thus 2,(¢) and 2,(t)] are normal complex 
rocesses, although Z,(t) [and 2,(t)] is nonstationary. 


It will be convenient to denote — at a given time instant 
s &,. The PDF of &, can be expressed as 


VE) = if ic bel 1G. aon 


-F(rye’*, roe’*)ryry dr, dro dd, dO., (7) 
here 
2, = ne" 2 = re”? (8) 


nd 6(€ — A) is a unit impulse at € = A. The function 
(21, 22) 18 the jomt PDF for the compiex normal variates 
, 22. Assuming that 2,, 22 have zero mean, this function 
; given by 


(1, 22) 
i ae | 21 |? | Zs : = 
es | (1 = pi) cs eee 


(2m) | a 


> Elz (1 — py) 


(9) 


here | z, |? is the ensemble average of 2,2%, y, is the nor- 
alized complex cross-correlation coefficient, 


2 iSo Bt 
he as ——=S= 3 wane pie ’ 
Lewes 


nd p,, 8; are 1ts magnitude and angle. 

The integrations with respect to 6, and 6, in (7) are 
rivial. The integrations with respect to 7; and rz may 
e carried out with the aid of the integral, 


(10) 


: | GY expC Ye OXY cose) dX ad ¥ 
(0) 0 


(11) 


hich may be found in Rice.” Considering that &, is 
onfined to the interval —7 < & < m (in the manner 
iadicated in the Introduction), it is readily determined 
hat 


1 1 — p: 
VE.) = On |; — p, cos (& — ot 
[i 4 20s (f1 — Bi) fr — cos" {pr ¢0s (Gr = i) 
Ay ey Re aa 


0 < cos’ [p, cos (& — B,)] <7 
—7 <& <7. 


= 1 osc’ 1 — € cose), 


(12) 


°S. O. Rice, ‘Mathematical analysis of random noise,’ in 
Noise and Stochastic Processes,’’ N. Wax, Ed., Dover Publications, 
nc., New York, N. Y., p. 207; 1954. 
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It is important to note that the PDF of €, is entirely fixed 
by y: = p. exp [j6,]. When p, = 0, the PDF of & becomes 
uniformly distributed. As p, increases from zero, the PDF 
begins to peak at £ = 6,. As p, approaches 1 (it can 
never be greater than 1), W(é,) becomes an impulsive 
type PDF centered on & = £,. In the limit p, = 1, 
W(é,) becomes a unit impulse located at € = 6,. Fig. 3 
shows a plot of W(é,) for p; = 0.95, 0.995 and 0.999. 


x. x 
Pie) B, B® 


ra 
5 B35 


Fig. 3—Density function for &:; p, = 0.95, 0.995, 0.999. 


Examination of Fig. 3 shows that, as evidenced by the 
spread of the density function, a non-negligible phase 
noise is present in the phase difference process even for 
p, = 0.950. For p, > 0.950, the approximation 


t 5 2 << t < 
ete atl, SOS ee. 
7 & eZ 1, 


WE) &5 (18) 


where 


YE inh Bia 9a 
a ee O° a 
V1 = 9: 


may be used. In fact, the approximation represented in 
(13) would be graphically indistinguishable from the cor- 
responding curves of Fig. 3 if plotted for the same values 
of p,. However, there is one situation in which (13) will 
not give an adequate representation to W(é,), even for 
p, > 0.950. This occurs when 8, is close enough to 7 
or —7. To be more precise, if L denotes the length of 
the interval over which (13) has significant nonzero values 
(according to some reasonable criterion), then this ap- 
proximation should only be used if | 6, | < am — L/2. 
For values of p, near 1, L will become small enough so 
that the condition | 6, | < « — L/2 will not be too re- 
strictive. It will be reasoned now that, in fact, | 8, | 


tan {sin p,}, (14) 
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cannot become appreciably greater than 7 — G when 


L = 
9 <G-K 


(15) 
and the output signal distortion is small. 

The reasoning is as follows. If (15) is satisfied, then the 
density function W(£,) is confined to the linear portion 
of the phase detector characteristic and is closely given 
by the unimodal function of (13). Therefore, 


go = F(®) = ER B,. (16) 
If signal distortion is small, then 
Bn dos! A eG (17) 


as was to be demonstrated. 

For a fixed pair G, K, (15) will be satisfied automatically 
if p, is sufficiently close to 1 or, equivalently, if a, > 1. 
It will be assumed in the remainder of this paper that (13) 
yields a satisfactory approximation to W(é,). Without 


going into the algebra, it is readily demonstrated that if 
the output noise V, is defined as 
N, = gE, a holt), (18) 
then 
a ar 1 
| N, a 
ay 
ES (19) 
at 
P 
MENG Rea 
\/ trate 
provided a,(r — | 6, |) > 1. 


Since the average magnitude of the fluctuations of &, 
about its average value ¢, is just 1/a,, and since the peak 
value of W(é,) [see (13)] is a, /2, it is convenient to visualize 
the density function of £ as being contained within a 
rectangle 2/a, wide and a,/2 high centered on ¢». Actually, 
the area of W(E) contained within the limits ¢, — 1/a, 
and ¢) + 1/a, is 0.707, as is readily seen by use of the 
last part of (19). 

The expression for the mean squared value of NV, is 
not as simple as that for the mean absolute value of N,. 
Because of the simplicity of the latter expression, it has 
been found more convenient for use as a measure of the 
spread of &, (or V,). Thus, if one assumes a peak output 
signal of ¢,, radians, a simple measure of the output peak 
signal-to-noise ratio is ¢,,/| N, | = adn. On this basis, 
the curves shown in Fig. 3 for p, 0.950, 0.995, and 
0.999 correspond to signal-to-noise ratios of 3, 10, and 22, 
respectively, for a peak output signal of 1 radian. 

To summarize briefly: if it is assumed that 


a, (ar 2 | B » > i 


then the determination of the output signal involves an 
evaluation of 6,, the angle of 
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(the normalized complex cross-correlation coefficient be- 
tween the PM signal and the phase reference signal) 
while the determination of the output noise level (on 
an ensemble basis) requires an evaluation of p, = | ¥ |. 


III. EVALUATION OF THE CoMPLEX CORRELATION 
COEFFICIENT 


In this section, an expression for y, will be derived, 
showing its dependence upon the input modulation, the 
autocorrelation function of the noise carrier, and the 
impulse responses of the filters. 

If it is assumed that », and 7, are independent of one 
another and of z,; and 22, then 


ef ee OE EEO Owe 


Bt, = Sh hes 
2%, =| Z, |? + fa? ]2P 2 4 ePe (20) 
2%. = [Zo | ene t= DP ae ee 
where the ensemble powers Ps,, Ps., P»,, Pn, are as defined 
below: 
Pe Sela Pe Sere aaa 
It follows that y, may be written as 
jo d, (22) 
where 
ig KT 
y ae d= : - (23) 


(i+ Be)(t + Fe) 


Thus, y, is representable as the product of the complex 
correlation coefficient with no additive noise, 7,, by a 
degradation factor d, < 1. This degradation factor de- 
pends only upon the ensemble signal-power-to-noise-power 
ratios at the two inputs to the phase detector. The reason 
for calling d, a degradation factor is clear from our discus- 
sion of the probability density function of €,. Any effect 
which decreases the magnitude of y, is a degrading effect, 
since the output noise is thereby increased. We will turn 
our attention almost exclusively to an evaluation of y,, 
since d, is only a function of signal-to-noise ratios before 
phase detection and may be dealt with afterward. Note 
that since d, is a real positive quantity, 


g(t) = angleof y, = angleof 7,. (24) 


Thus, the output signal g(t) [as defined in (1)] zs not de- 
pendent upon the additwe noise and may be calculated 
directly from 4,. 

Turning now to the evaluation of 7,, we note from Fig. 2 
that 
Z(t) = 22(t) © mi) 


= 3] melt = 6) de (25) 


Zaki) = wee? @ ul) = 4 f walBelt — Bei*-” ag, 
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‘onsequently, 
AZ, = 3 [ff ei@mtarerre® 
z*(t — o)e(t — B) do dB 
AZ, = 3 | ‘ il “ X(o)ms(8) BU = @Ab — B) do 8 (26) 


Lio = a il Huai Ge oe es 


-2*(t — a)e(t — 8B) da dB 


assuming the validity of the interchange of integration 
nd ensemble averaging). 

The statistical averages inside these integrals require 
. little discussion. 

It will be recalled that the z(t) process is stationary. 
The ensemble autocorrelation function 2*(t,z(t; + 7) is 
hen a function of 7 only, 2.e., 


2*(t)elh + 7) = R(z). (27) 


In terms of the real and imaginary parts of z(t), R(7) 
nay be expressed as 


R(z) = 2R..(7) + j2R,,(7), (28) 


vhere the even function F,,(7) 1s the common autocor- 
elation function of = Re {z} and y = Im {z} and the 
dd function R,,(7) is the crosscorrelation function be- 
ween « and y. 

Using the autocorrelation function R(7), we may write 
he averages in (26) as 


eT, = / / u%(o)us(Be*'R(o — B) do dB, 
(6) 0 
AL, =} [ i u%(o)u,(8)R(o — 6) do d8 = 2Ps,, (29) 


‘R(o aad 8) do dB = 2Ps,, 


nd obtain the following general expression for 7;: 


if i p(o)us(Be"*'R(o — 8) do dB 
= 8VPsPs.(t | 


A 


Yt 


(30) 


ecause of the nonstationary character of Z,(t), we note 
hat the ensemble average power Ps, is generally time 
arying. 

It will now be demonstrated that it is possible to 
etermine the output signal g(t) as the PM response of 
linear filter to the input PM ¢(t) on a sinusoidal carrier. 
To demonstrate this fact, we note from (30) that 


() = angle of ff X(o)us(Be'*-P Ro — 8) de dB. 
(31) 
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If we define 


(oe) 


p,(8) = il pi(o)R(o — B) do, (32) 


then 


i i ui(o)u(B)e'?“-R(« — 6) do dB 


oo 


= i} n%(8)ua(B)e?” dB = atu, @ei’. (33) 


Eqs. (81) and (33) show that the output signal may be 
regarded as the angle of the output from a linear filter 
with impulse response f*2 when the input is e’*‘”. The 
conventional real band-pass interpretation of the previous 
statement is that if cos [wot + ¢y(t)] 1s the input to a filter 
with impulse response Re [@%u2 exp (jwot)|, then the out- 
put signal will be A(t) cos [wot + go(t)], where A(t) is 
the envelope and g,(t) 1s the phase of the output signal. 
It is assumed in this latter statement that w, is large 
enough so that the input and impulse response are narrow- 
band time functions. The filter with impulse response 


h(t) = Re {Aimze'°""} (34) 


will be called the equivalent linear filter. Fig. 4 depicts 
the determination of the output signal with the aid of 
the equivalent linear filter. 


| a(t), |-—> A(t)eos|,¢ 4p 96t0] 


Fig. 4—Determination of output signal. 


cos{u, cts 7 


It is of interest to examine ¥, for three special situations: 
1) no signal modulation, 

2) wide-band noise carrier, 

3) narrow-band noise carrier 

We will consider these in order. 


No Signal Modulation 


This case is of special interest, since an evaluation of 
¥, for no input modulation determines the background 
“self” noise of the system. When g(f) is zero, 


‘ i u'(o)u2(6)R(o — B) do dg 
Fily=o corer Ba) 0A oe oy es : 


Since | 7 | does not equal one, in general, noise will appear 
at the phase detector output. This self noise is due to the 
dissimilarity in the filters, since when u,(t) = w(t), it 
is readily determined that | 7) | = 1. 


(35) 


Wide-Band Noise Carrier 


If the spectrum of z(t) overlaps the pass bands of filters 
1 and 2 and is essentially constant in the overlap region, 
then, in the integrals defining Z%Z., Z*Z,, and Z*%Z., we 
may set 


Ga) = 1G NGS). (36) 
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where 6(r) is a unit impulse at 7 = 0, and K is the value 


of the spectrum of z(t) at w = 0. In this case, 


ET ei Lane 
A | uk(a\uslole”" de, 
J0 


TD, == [| m(o) P de, (37) 
Tile = =f ieee) ? do, 


and 


oo 


ip (inte? de 
1 = y (38) 


i | ps P do | | 2 | do 
0 0 


If we normalize pu, and po so that 


IP Onde te 
: (39) 
[mF ae = 1, 

then 

4 =f wtloua(orel®? do = [wi Oual)} @e'*. 40) 


Thus, we have the interesting result that in the case 
where the spectrum of z(t) is essentially constant over the 
pass bands of the filters, the normalized complex corre- 
lation coefficient is just equal to the response of a linear 
filter with impulse response wi ue(O) to the excitation 
e/* This fact is depicted in Fig. 5. Thus, for a wide- 
enh noise carrier, the equivalent linear filter determines 
both signal and noise output. 


oJ o(t) w(t) (t) 1, 


Fig. 5—Determination of 7, for wide-band noise carrier. 


Narrow-Band Notse Carrier 


When the bandwidth of z(¢) is narrow compared to the 
bandwidth of filters 1 and 2, we may take 


R(r) = (41) 
where P is the mean squared value of z(t). It is then 
readily determined that 

| utto) do [ ulBe'*-? as 
JO 0 (42) 


i¢(t—-B) dg 


au [sto ae | [sli 


Examination of (42) shows that | 7, | = 1. Consequently, 
there is no self noise at the phase detector output. The 
output signal is 
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g(t) = angle of | u(t) dt 
0 


+ angle of / Une Tabs (43) 
0 

Thus, apart from an additive constant, the output signal 

is just the angle of the output of filter 2 with e’*‘” as input. 

It is interesting to note that, as one should expect, 

the output signal given by (44) is just what would be 

arrived at if a sine wave carrier were initially assumed. 


IV. DETERMINATION OF SELF-N o1sE OuTPUT 
FOR A SPECIFIC EXAMPLE 


It is worthwhile to consider a specific case of the 
evaluation of | 7) | in order to get a feeling for the relative 
effects of the noise carrier bandwidth and filter dissimi- 
larity in producing self-noise output. If it is assumed that 
filters 1 and 2 are single tuned RLC filters, and that the 
noise carrier is obtained by passing white noise through 
an RLC single tuned shaping filter, then we can take . 


Hit) =n i ie nO 
p(t) ue Cyt ee ye 0 (44) 
Tipe ae 


The half-power bandwidth of filter 1 is 2a, of filter 2 
is 2b, and of the noise carrier is 2k radians/second. It: 
should be noted from (44) that in addition to differing 
in bandwidth by 2 | 6 — a | radians/second, the filters 
differ in center frequency by © radians/second. 

It is readily determined that 


a 1 1 1 
Whe = | at |: 
eae ales 

a(a + k)’ 


(Oi eae) ; 
b[(b + ) -F 


ZZ, = (45) 


ZZ, = 


After some algebraic manipulation, one finds that 
ace [ 4ab | 
Yo (b SE a)? rs Q 
Oa 


Li Wee bt Soe ea (46) 


at | 

It is interesting to observe how | 7, |’ varies with the 
carrier half-bandwidth k. In Section III, we found that 
for zero carrier bandwidth the “self”? noise disappeared, 
while for large carrier bandwidth the degree of self noise 
approached a limiting value. One intuitively expects the 
degree of self noise to decrease monotonically with de- 
creasing carrier bandwidth. This means, in our example, 
that | 7) |’ should approach 1 asymptotically and mono- 
tonically from below, as k decreases. 

Examination of (46) shows that this is indeed true. 
For k = ©, the second term in brackets becomes unity. 
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hus, the first term in brackets is the k = © value of 
Yo * | Yo lc. One readily determines that the second 
rm in brackets increases monotonically as k decreases 
yproaching the value 1/| yo |Z. It is clear that a pessi- 
istic estimate of the output noise will be obtained by 
suming that the carrier bandwidth overlaps the filter 
vndwidths. 

Approximate expressions valid for small self noise will 
yw be derived. Let the percentage difference in filter 
indwidths be defined as 6, 


ra 


;, (47) 


id let us choose k = a, 7.e., a carrier noise bandwidth 
jual to the bandwidth of filter 1. Under the assumption 
at 6 and Q are small, we may obtain the following ex- 
unsions for the two square bracketed terms in (46): 


g is -| 4ab | 
YO jo (bats a) oe ee 


(J =Qy ens aw 
; (5 ue V : “| Y 4a + oH + a 
n14 (+(e 
‘hus, 
ara $Ql 82). a 


From (14) and (19), it is clear that the square of the 


verage magnitude of the self noise output (| N,| )” is 
iven by 


eg | Na 
t é == : = 
(W, )* = 73 


| 2 


(50) 


Vhen | y, |* is near one, it is convenient to express it as 
2 2 
Pigeliecs ee ace (51) 


here & is a small positive quantity. Then (| NV, |)’ may 
e represented by the series expansion 


(QM, jvyeelteteat---] xe. (52) 


Using (49), (51), and (52), the self-noise output for 
1e case at hand is approximately 


ry RIG + Q)] mo 


= (53) 


[G+] = +-- 


here the subscript t has been dropped, since with g(t) = 0 
e ensemble averages are time invariant. Eq. (54) indi- 
ites that the k = © bound is reasonably good for the 
se in which the carrier bandwidth is of the order of 
e filter bandwidth. 
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If it is desired that the average magnitude of the self 
noise should not exceed 0.05 radians for wide-noise carrier 
bandwidth, then from (53) it is seen that 


se) Ey < 0.05. 


Certainly then, the filter bandwidths should not differ 
by more than 10 per cent and the offset in center frequen- 
cies should not be more than 5 per cent of the filter band- 
width if there is to be any possibility of the average 
magnitude of self noise being less than 0.05 radians. The 
tolerances only increase by a factor of 1.15 if one assumes 
k = a instead of k = o. 


(54) 


V. EVALUATION or SELF NOISE FOR GENERAL 
SLIGHTLY DISSIMILAR CHANNEL FILTERING 


« 


In this section, attention will be focused on the deter- 
mination of the output self-noise level. It will be assumed 
that small dissimilarities exist in the filtering on the direct 
and reference channels. For simplicity of analysis, the 
noise carrier bandwidth will be assumed wide compared 
with the filter bandwidth. In view of the analysis in 
Section IV, one may expect the resulting upper bound 
on self-noise performance to be useful for noise-carrier 
bandwidths at least as small as the filter bandwidths. 

Various distortions may be assumed in one filter rela- 
tive to the other. Three simple distortions which are 
easy to characterize analytically are bandwidth, center 
frequency, and delay differences. 

The complex correlation coefficient for large carrier 
bandwidth and zero signal modulation is given by 


fo= f utOus(0 at, (55) 
0 

on the assumption that | u, |’ and | uw |’ have been norma- 

lized to unit area. According to Parseval’s theorem, an 

equivalent frequency domain expression is 


fo = [UNUM af (56) 


where 


Uf) = / Pee naan mie ole (57) 
0 
If filter 2 differs from filter 1 only by a small percentage 


change 6 in bandwidth, one may take 
u(t) = V1 + du(t + 3d), 


where the factor 1 + 6 is included to satisfy the 
normalization condition. A difference Q in center frequency 
may be denoted by 


(58) 


p(t) = e Sat) 


while a difference in delay is represented by 


u2(t) =m(t me ok 


(59) 


(60) 
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The corresponding expressions for Yo are 


vo = Bis) = V1 +8 “ux(Qu(t + 6t) dt, — (61) 
vo = (9 = | | n Pe™ de, 
yo = Dr) = {| w*(Oult + 2) dt, 


where the subscript 1 has been dropped. It is interesting 
to note that yo(r) is just the autocorrelation function of 
u(t). If one considers that the channel filters differ both 
in delay and center frequency, then 


[o) 


yO, 2) = ff uhOule + del dt, (62) 


which is just Woodward’s ambiguity function'® in delay 
and Doppler shift for a radar pulse u(t). 

For small 6, 2, and 7, one may obtain Taylor series 
expansion of | y) |” in 6, 2, and 7 [assuming certain regu- 
larity conditions on u(t)] whose first terms represent 
simple approximate expressions for self-noise output. In 
this connection, let 


~u(é)-= ade, (63) 


where a(t) is the magnitude and X(t) is the angle of u(t). 
Then it may be determined that 


[2@ f=1- #4f 
-| f wsae] + 3h. 


foe) 


falar” — a] dt 


ECan Sree (64) 
OG al Sea 
where A’, V’, are the mean squared duration and mean 


squared bandwidth, respectively, of u(t) (recall that 


| u(t) |? and | V(f) |? have unit area), 


a=f@-plwora, t= fluo tae 


(65) 


v=f/¢-pAlugra, f= [rom Pa. 
The dots in (64) indicate differentiation. 

Thus, the squares of the average magnitude of self noise 
for the three cases above are 


(| N |)? = 6X” (percentage difference of bandwidths 6), 

(IN)? Swue Cmpiongs of center frequencies Q (66) 
radians/second), 

(N |)? = 7°(2rV) (difference of delay 7). 


10 Woodward, op cit., p. 120. 
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One may consider the general situation in which these 
three types of filter differences exist simultaneously. This 
problem is involved algebraically due to the appearance 
of six cross product terms, and no such generalizations: 
will be presented here. 


VI. EVALUATION oF OUTPUT SIGNAL AND NOISE 


The degree of noise and the amount of signal distortion 
present at the phase detector output will be studied in 
this section. It will be assumed, as in the previous section, 
that the noise carrier bandwidth is large compared to 
bandwidths of the channel filters. In this case, 7, may 
be represented as the response of the equivalent linear 
filter to e?*°”, as shown in Fig. 4. This filter has an impulse 
response yu*(t)u.(t) for the wide-band noise carrier case. 
Because of this representation, the problem of the deter-| 
mination of y, is reduced to a familiar PM problem: the 
determination of the envelope and phase of the carrier 
at the output of a linear narrow-band filter, when the 
input is a phase modulated wave. Thus, we may use the 
Carson-Fry"’ and Van der Pol-Stumpers’”’’* expansions. 
These expansions have recently been studied by Bagh- 
dady"*, who has precisely defined the conditions under 
which quasi-stationarity holds. The reader is referred to 
this paper for a discussion of the conditions leading to. 
the validity of the quasi-stationarity representation. We 
will only present a heuristic approach here. Quasi-sta- 
tionarity implies that the output of the filter may be 
computed as if the input phase derivative varied at an - 
infinitesimal rate. Thus, in the convolution integral (40) 
defining y,, one may use the approximation 


(67) 


arrived at by replacing g(t — 7) by the first two terms 
in a Taylor series expansion about 7 = 0. It follows that 


i¢(t—T) 


+ile(t)—re(t 
e =e ile Te ( ee 


| 


1, = eir® / pt(r)uo(re 77? dr (68) 
0 


is the quasi-stationary representation of the complex cor- 
relation coefficient. 
From the quantity d(t), defined by 


Ory ent = | 


it is readily seen that both the distortion of the output. 
signal and the degree of output noise may be determined. 
Assuming quasi-stationarity, the angle of d(¢) is just the. 


signal distortion, while (1 — | d(t) |*)/(| d(@) |?) is the” 


square of the average magnitude of the self-noise output. 


_ uJ. R. Carson and T. C. Fry, “Variable frequency electric 
ney theory,’ Bell Sys. Tech. J., vol. 16, pp. 513-540; October, 

37. 

2B. Van der Pol, “The fundamental principles of frequency 
modulation,” J. TEE, vol. 93, pt. 3, pp. 153-158; May, 1946. 
FF. L. M. Stumpers, “Distortion of frequency modulated 
signals in electrical networks,’ Comm. News, vol. 9, pp. 82-92; 
April, 1948. 

“i. J. Baghdady, “Theory of low distortion reproduction of 
FM signals in linear systems,’’ IRE Trans. on Circurr TuEory, 
vol, CT-5, pp. 202-214; September, 1958. 


xamination of (68) shows that d(t) may be interpreted 
the case of quasi-stationarity as the complex correla- 
on coefficient resulting from a separation of the channel 
er center frequencies by an amount 2 = —g(t). Thus, 
hen quasi-stationarity is valid, one may visualize the 
odulation process as slowly shittin’ the center frequency 
filter 2 back and forth relative to filter 1 by an amount 
ual to the input frequency modulation. Since an offset 
center frequencies produces output noise, it is clear 
iat the modulation process will produce a time varying 
ymponent of output noise level in addition to whatever 
If noise is present due to filter dissimilarity. When the 
ters are identical, it is clear from (68) and (61) that the 
itput signal distortion g(t) — ¢(#) is given by 


volt) — v(t) = angle of C(—¢). (69) 


For sufficiently small g(t), the output squared average 
agnitude noise is 


(Ny |? = 2°@). (70) 


he ensemble noise level is time varying and may be 
veraged to obtain a measure of the time average back- 
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ground noise level. The measure of this background noise 
level will be taken as the square root of the time average 
of (| N, |)”. From (70) this is given by 

VU N, )?) = AV@)), (71) 
where ( ) denotes the time average. Thus, in the wide- 
band noise carrier case, the time average background 
self-noise level in radians (small self-noise case) is directly 
proportional to the RMS value of the input FM in ra- 
dians/second. The proportionality constant is the RMS 
duration of the filter’s impulse response in seconds. 

In conclusion, it should be pointed out that the entire 
analysis of this paper has assumed the existence of a 
very wide bandwidth at the phase-detector output. In 
actuality, this bandwidth would be limited to the band- 
width of g(t). Thus, the noise levels predicted will only 
be upper bounds if the output noise bandwidth is larger 
than the bandwidth of g(t). To account for the effects 
of an output filter would require the evaluation of the 
autocorrelation function (or power spectrum) of the phase 
difference process &(t). This is beyond the scope of the 
present paper. 
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Summary—This paper gives a method for constructing minimum- 

.dundancy prefix codes for the general discrete noiseless channel 
ithout constraints. The costs of code letters need not be equal, and 
ie symbols encoded are not assumed to be equally probable. A 
lution had previously been given by Huffman [10] in 1952 for 
.e special case in which all code letters are of equal cost. The 
‘esent development is algebraic. First, structure functions are 
2fined, in terms of which necessary and sufficient conditions for 
ie existence of prefix codes may be stated. From these conditions, 
near inequalities are derived which may be used to characterize 
efix codes. Gomory’s integer programming algorithm is then 
sed to construct optimum codes subject to these inequalities; 
ymputational experience is presented to demonstrate the practic- 
vility of the method. Finally, some additional coding problems are 
iscussed and a problem of classification is treated. 
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vides sharp upper bounds on the transmission 

rates achievable using letter-by-letter coding for 
discrete noiseless channels, the problem of constructing 
codes which maximize the transmission rate in specific 
situations has not been completely solved. Shannon [19] 
in 1948 and Fano [5] in 1949 gave essentially identical 
methods for constructing near-optimum codes when all 
code letters have equal cost. In 1952 Huffman [10] used 
an elegant combinatorial approach to obtain a strictly 
optimum solution to this problem. Blachman [1] in 1954 
generalized the Shannon-Fano approximate technique to 
treat the situation in which the code letters differ in cost. 
Marcus [14] in 1957 improved on the Blachman tech- 
nique by combining it with further combinatorial results 
of Huffman. 


\ LTHOUGH Shannon’s Fundamental Theorem pro- 
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The present paper describes an algebraic approach to 
the construction of minimum-redundancy (minimum-cost- 
per-bit) prefix codes for discrete noiseless channels such 
that all sequences of code letters are permitted. In Sec- 
tion II structure functions are defined, in terms of which 
necessary and sufficient conditions for the existence of 
prefix codes may be stated. In Section III two sets of 
linear inequalities in integer-valued variables are used to 
characterize prefix codes. Next, in Section IV, Gomory’s 
integer programming algorithm is used to construct 
optimum codes subject to these inequalities; computa- 
tional experience is presented to demonstrate the practi- 
eability of the method. Finally, in Section V, some 
additional coding problems are discussed, and the results 
of earlier sections are applied to a problem of classification, 


I. DEFINITIONS 


We assume as given an alphabet of symbols @ = 
{X,, Xo, -:: , X,} and an information source generating 
source messages consisting of sequences of these symbols 
in which X,; occurs with probability p,. These messages 
are to be encoded for transmission over a communication 
channel admitting the code letters s,, sz, i aes 
to be done by associating with each element X,; of @ 
a code word C, consisting of a nonempty sequence of code 
letters, and substituting for each source message an encoded 
message obtained by systematically substituting code 
words for elements of @. 

A set Ir = {C;, C2, --- , C.} of code words is called a 
code. A code is said to be uniquely decipherable if each 
finite encoded message is an encoding of a unique source 
message. Sardinas and Patterson [17] in 1953 have given 
a test for unique decipherability. 

A sequence a of code letters is a prefix of C; if C; = a-8, 
where - denotes juxtaposition and either a or 8 may be 
the null sequence ¢. The prefix is called proper if 8 ¥ ¢. 
Thus, the prefixes of the code word 8,838,8,S2 are $82, 8283, 
$2838), $283,851, aNd $2838,S,S2. Only the last of these is 
improper. A prefix set is the set of all elements of @ for 
which the associated code words begin with a given prefix. 
A prefix code is a code such that no code word is a prefix 
of any other. Any prefix code is uniquely decipherable. 
A prefix code is exhaustive if, for any two code letters s; and 
S,, @:S; 18 a prefix of a code word if and only if a-s, is a 
prefix of a code word. When an exhaustive prefix code 
is used, any sequence of code letters is the beginning of 
some encoded message. 

The structure of a prefix code may be described by a 
tree, as shown in Fig. 1. Each terminal node is associated 
with an element X; of @. The branches leaving each node 
are labeled with the names of distinct code letters, and 
the code word C; is found by listing in order the labels 
of the branches leading from the root of the tree to the 
terminal node associated with X;. The paths through 
any given node correspond to the symbols in a prefix 
set. The code is exhaustive if there are r branches leaving 


Fig. 1—Tree for an exhaustive prefix code. 


each node, such that each label s,, 1 < k < 7, appears 
on exactly one branch. In the code of Fig. 1, 


C. = §)S], Cr = S28) ) Cy = S28380 , 
Gs = 8,82, G = S280, Cs = S3 ) 
Cz = 1Siss, Ce =. S838), Coma es or 


Let c, be the cost of transmitting the code letter s;,. 
Then if N“ instances of s, occur in C;, the code word for xe 
the cost of transmitting X; is assumed to be 


r 


l; = 5 CN; 


k=1 


(1) 
and the average cost per symbol transmitted is given by 


© = SE p,l;. 


t=1 


(2) 


For uniquely decipherable codes, Shannon’s Fundamental 
Theorem’ yields the following lower bound: 


SD ae 


S 4a=1 = ' ; € 
Cheers sae LP: log. Di, (3) 
where ¢ is the unique positive root of the equation 
gg en eee? een), (4) 


For all communication channels, 0 < ¢t < 1. The quantity 
—logs tis called the capacity of the channel. The bound is 
achieved when 1; = log, p; (¢ = 1, 2, --- , n). 

The efficiency of a code is given by the ratio 


D> Pi log. pi 
foe 2a ae 


ee (5) 
and the redundancy R is given by R = 1 — F. 


1 Shannon, [19], Theorem 9. 
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The problem of minimum-redundancy coding is to find, 
a given set of probabilities p; and costs c,, a uniquely 
‘cipherable code which minimizes @. It is known [13], 
|] that when all the c, are equal, the set of optimum 
des must include at least one prefix code. It is not known 
ether this property holds in general. In this paper, 
tention will be restricted to the prefix codes, and a 
ethod will be given for minimizing @ over this class of 
des. We begin by giving a structural characterization 
“prefix codes. 


Il. Structure FuNcTIoNS FoR PREFIX CopES 


Several writers [19], [11], [13], [16] have shown that 
r code letters of unit cost are available, there exists a 
uquely decipherable code, and also a prefix code, such 


at C; has cost l;, if and only if >°%., r°* < 1. In the 
esent section, we extend this result by giving necessary 
id sufficient conditions for the existence of prefix codes 
the general case. 

As before, N* will denote the number of instances 
Ss, occurring in C;, for a given code I. With each code 
we shall associate a polynomial form ¥(T) in the vari- 
les Wy, We, --- , w,, Which we shall call the multivariate 
ructure function (MSF) of T. This form is defined as 
lows: 


Y(C;) = I] ie 
k=1 
gt lr = (Ce GS, a Cae (6) 


WT) = a WC,) = Plw,, wr, +++, w,). 


ote that y(T) does not in general have a unique inverse, 
2, there will exist codes T and I’ such that ¥(T) = y(1"). 
The following theorem characterizes the exhaustive 
refix codes.” 


heorem 1 


The polynomial P(w,, we, --- , w,) is the MSF of an 

chaustive prefix code if and only if 

PO, 0,.2-<.,,0) =-0, 

2) all coefficients of P are nonnegative, 

B) P(wy, W.,---, Wp) —1=@w,+um+--- +, = 1) 
"QW, We, *>* , W-); 

here Q is a polynomial having only nonnegative co- 

ficients. 

Before proving the theorem, we give some examples 

r the case r = 2: 


a) P = 2w, + 2w, + 3; 
condition 1) is violated. 
b) P= wi + w, + wr} 


condition 2) is violated. 


3 2 
Wi Ws 


2 In the course of developing this theorem, the writer benefited 
ym discussions with Profs. D. E. Muller and M. P. Schiitzen- 
rger. 
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ec) P=wuitwit+3uuw.; P-l=(u,+w —D 
“(wi + ww, — wi, + w, + w, + 1); 
condition 3) is violated. 


d) P=wi t+ w+ ww, + ww. + wr; 


P—1 isnotdivisibleby (w, + w. — 1); 


condition 3) is violated. 


ec) P= wt ww, t+ ww, + w; 
(P-)=Mm4+wm—-DVw+wt+d; 
Pi) “where 7 We=s.s55i5 S857 Soren = 
Proof: 


Necessity: If P(w,, w2, --: , w,) 1s the MSF of an 
exhaustive prefix code, then it satisfies 1) and 2) by 
definition. To show that 3) is satisfied, we use induction 
on the maximum code-word length, where the length of 
C; is >o7_, N*. The only exhaustive prefix code with 
maximum length | has the MSF w, + w. + --- + w,, 
which satisfies 3) with Q@ = 1. Suppose 3) holds for all 
exhaustive prefix codes of maximum length less than ZL. 
Let P(w,, We, --- , w,) be the MSF of an exhaustive 
prefix code of maximum length Z, and let S(w,, ws, - ++ , w,) 
be the sum of the terms of degree L. Then, by the defini- 
tion of exhaustiveness, for all code letters s; and s,, a-s; 
is a code word of length L if and only if a-s, is a code 
word of length L. Therefore, S may be written as 


Sw, UES ONO W,) 


= (wy + We + +++ + w,) Yar) + Wax) +--+ + (a) 


where a, @, -*-: , a, are the proper prefixes of length L — 1 
in the code IT. Let ¥(a,) + War) + + yY(a,) be 
denoted by T(w,, wz, --- , w,), where all the terms of 7 
have positive coefficients. Then the function P may be 
expressed as 


“5 Wr)e =o (Oy te tert amt) 
Wr) = V(w,, We, atl 


P(w,, Chay 2° 


“Tw1; We, i: FU) 


where V is of degree less than L and has only nonnegative 
coefficients. Then if the rp code words of length ZL are 
replaced by the p prefixes a,, a, --- , a new ex- 
haustive prefix code is obtained, having the MSF 


Ap, 


P(w,, We, «++, W,) = Pw, We, +++, Wy) 
— (wy, + w. + --- + ,)T(wi1, We, +++, W,) 
+ T(w,, W2, --*, W,) 
== 5 Lig aprons 1st (Wy) Darran ta) 
and by the induction hypothesis, 
P(w,, W2, -**,w,) — 1 
= (w, + w+ +--+ + w, — IQ, we; ---, w,) 
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where Q has only nonnegative coefficients. Therefore, 
P(w,, Wo, +++, w,) — 1 = (wy, + w+: +w, - DD 
‘(TWw,, Daye SaRtane) ae Ow, , CO eee 5 W,)) 


and (3) is satisfied. 

Sufficiency: We shall show that, if P satisfies 1), 2), 
and 3), then it is the MSF of an exhaustive prefix code. 
Let P be of degree 1. Substituting w, = w. = --- = w, = 0 
in 3), and observing that 1) holds, we find that 


=I = (-)Q0..0;--=, 0), 


Therefore Q(0, 0, --- , 0) = 1; also, Q is of degree zr, 
since P is of degree 1. Therefore Q = 1 and P = w, + 
Wo + +--+ + w,, which is the MSF of an exhaustive 
prefix code. We now proceed by induction. Suppose the 
result holds for all polynomials of degree less than JL, 
and let P be of degree L. Since 3) holds, 


Pw, Ws, 22, 10, — 1 = Whe ioe eee) 
(T(w,, Db aio Bas ee) St Q(w,, WY 95 OO w,)) 


where 7’ is homogeneous of degree L — 1, and all terms 
of Q are of degree less than L — 1; Q and T contain only 
nonnegative coefficients. Then 


P(w,, Wo, °°*, W,) = Pw, We, -**, W,) 
= (Wi AP Wy Cp tes 95D) 
=n T(w,, Ube OS w,) (7) 


satisfies 1), since P(O, 0, --- , @) = TO, 0, --- , 0) = 0. 
Also, since 


PQb;, Wy, 825.) = (oy ow, 4 wD) 
-Q(wi, Ws, Boe ie (8) 


3) is satisfied. To verify 2), note that, since P and T have 
only nonnegative coefficients, the only possible negative 
terms in P arise from the expression 


—(w, + w. +--+: + w,) Twi, Wo, ++, W,), 


and are of degree L. But (8), in which Q is of degree 
less than L — 1, shows that no negative terms of degree 
L can occur in P. Therefore, by the induction hypothesis, 
P is the MSF of an exhaustive prefix code. Moreover, 
by inspection of (7), each term of 7 appears as a term 
in P (if P is expressed with only unit coefficients), since 
no term of T is canceled by a term in P, which has only 
nonnegative coefficients, or by a term in —(w, + We + 

- + w,)T, which is homogeneous of degree L. Then 
each term in 7’ is ¥(a) for some code word a of length 
L — 1 in the code associated with P. If each such code 
word is replaced by the r code words a:s,, a@* 8, +++ , a-S,, 
an exhaustive prefix code having the MSF P is exhibited. 

The quotient polynomial Q may be characterized as 
follows: 

Corollary 1.1: If P = (1), where T is an exhaustive 
prefix code whose code words have the distinct proper 


prefixes 61, 62, °*~ , 6,, and 
(P —1):= We we oo wr DG; 
then Q = ¥(8:) =P V(B2) ac baets ¥(8,). 


Proof: Again we use induction on the maximum code 
word length L.1f £ = 1, P = wy - ws te 
Q = 1 = (9), and the result holds. Suppose the theorem 
is true for codes having maximum word length L. Let 
P be ¥(1), where T has maximum word length L, and 
define Q, T, P, and Q as in the proof of Theorem I. 
Then P = ¥(f), where [I has the same proper prefixes 
as T, except for those of length L — 1, which no longer 
occur. By the induction hypothesis, 


Q = Wo) + vy) +--+: + Wy), 


where y;, Y2, ‘°° » Yq are the proper prefixes of I. Also, 
T = ¥(a,) + (a2) + --: + W(a,), where a,, a2, --* , 
are the proper prefixes of I having length L — 1. There-— 
fore, Q = T + Q has the required property. 

Example 1: In the code corresponding to the tree of 
Fig. 1, we see by inspection that 


P = (8,8) + W(Si82) + W(s:83) + W(S281) + V(S282) 
+ (828381) + W(S28382) + (Ss) + W(S28383) = WoWs 
+ wrws + wiwows + wi + 2wyw. + wiws + wz + ws. 


Inspecting the nonterminal nodes of the tree, which cor- | 
respond to proper prefixes, we find that 


Q = V@) + Ws) + W(S2) + W(s28s) 
= WoW; + WwW, + we + 1, 


and 
P= 1 = (wy ate te 


The following method may be used to construct an 
exhaustive prefix code T having a given MSF P: 


1) Define a polynomial R(w;, ws, --: , w,), initially 
equal to Q(w,, We, --- , w,), and a set S of sequences 
of code letters, initially equal to {¢}. 

2) Choose an element 8 ¢ S, not previously selected. 
If ¥(8) is a term in R, replace 6 by the r sequences 
B°Si1, B-S82, +++ , B-s,, and replace R by R — y/(@). 
If ¥(8) is not a term in R, select another element. 

3) Continue until R = 0. At this point, S will be the 
desired code. 


Next, we give a characterization of prefix codes in 
terms of the costs of code words. For this purpose, we 
assume that the costs of the code letters are integers; 
of course, multiplication by a suitable constant suffices 
to transform any set of rational costs into integers, with- 
out changing the problem of constructing optimum codes. 
As before, let ¢, denote the cost of s,, and let N* denote 
the number of occurrences of s, in C;, for a code T. We 
assume without loss of generality that c, < ec. < --- <¢,. 


cost of C; is given by 
k 
(S=S cN 
k=1 


define F(z) = >°"_, 2'' as the univariate structure 
tion (USF) of T. 


orem 2 


he polynomial F(z) = 50”, a,z’ is the USF of an 
austive prefix code if and only if 


) a; => 0 for all j, 
PRE) — | = 27’ + 27 + + 2° — 1)G@), 


where G(z) = ),70" bjz’, and b; > 0 for all j. 


roof: The result that the USF of any exhaustive pre- 
code satisfies 1) and 2) is obtained by making the 
stitution w, = z2°* in condition 3) of Theorem I. The 
iverse is proved by induction on the degree of F(z); 
: steps in the proof correspond exactly to those in the 
»of of sufficiency in Theorem I. 

orollary 2.1: The polynomial 5°”, a;z’ is the USF 
a, prefix code if and only is there exist integers d; such 


= d;, 1 = ] = mM, 
™ , d;z’ is the USF of an exhaustive prefix code. 


Yoroliary 2.2: If F(z) is the USF of an exhaustive prefix 
le T, and F(z) —1 = (* + 274+ --- +2 — 1G), 
ere G(z) = 3°”, 0,2’, then b; is the number of proper 
fixes of T having cost 7. 


Hxample 2: 
Mew 1.6 = 2. 


Fi) = 2° 4+ 422; Fie) —l=(@+2- 1 - 
2 + 227 + 2+ 1); G(z) has a negative coefficient, 
violating 2). 


eens) Cy = ts. = 2. 
F(z) = 22° + 32* + 22° + 227; F(z) — 1 = (22’ + 
z2— 1)? + 2 + 2+ 1); F(z) is the USF of an 
exhaustive prefix code. 


ee, =, Cs = 3 = 2: 
F(z) = 2 + 82* + 22° + 22°; comparison with 
b) shows that F(z) is the USF of a prefix code. 


If 8 is the sequence s;, s:, °°: , S:, we define ®(8) = 
steiat---*eim Tsing this notation, we may give the follow- 
‘rule for constructing a code having a given USF F(z). 


1) Define a set S, initially equal to {¢}, and a poly- 
nomial H(z), initally equal to F(z). 

2) Choose an element 8 ¢ S, not previously selected. 
If (8) is a term in H(z), replace H(z) by H(z) — 
(6); if not, replace 6 by the r sequences 6-s,, 
BS, °° , B°S 
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3) Continue until H(z) = 0; S will then represent a 
code having the USF F(z). 

4) For each term z' in F(z) — F(z), delete an element 
B of S such that (8) = 2’. 


IIL. Linrar Constraints on PreFrx Coprs 


In this section we derive certain linear inequalities in 
the coefficients a; and 6;. Interpretations of these in- 
equalities in terms of the structures of codes will be given. 
In the following section, the inequalities will be used to 
express the construction of optimum codes as a problem 
amenable to solution by Gomory’s integer programming 
algorithm [7]-[9]. 

As before, let F(z) = >", a;2’, and G@) = 00" bz’. 
Then for exhaustive prefix codes, 


Fe) -1 = @* + --- +2 — IGE), 


and, equating the coefficients of z’ on both sides of the 
equation, we find that a, = 0, bo = 1, and, 


fOnge> 0; 


a; = Me O27, a b;, (9) 


with the convention that b; = 0, 7 < 0. For prefix codes 
in general, Corollary 2.1 implies that (9) may be reduced 
to an inequality, 


a; Ss DB Oe = 1Oe (10) 


Thus, the results of the previous section may be restated 
as follows. 


Theorem 3 


There exists a prefix code having a; code words of cost 7 
if and only if there exist nonnegative integers b; such that, 
for i a 0, a; < SD On oar b;, (a) 

k=1 


(b) 
dy = 05 Oy = 1507 = 10 Oe ee (c) 


for <f>\0; oma; 


The code is exhaustive if and only if equalities hold in (a). 
These conditions may be given a simple interpretation 
by noting that, for an exhaustive prefix code, the number 
of prefixes of length 7 terminating in s, is 6;_,,. Summing 
over all k, the number of prefixes of length j is }of_, Dizon 
Each of these is either a proper prefix, of which there are b,, 
or a code word, of which there are a;. Thus, we obtain 
the identity 
a; + 6; = 3, Ove (11) 
k=1 
For a nonexhaustive code, some sequences are unused, 
and (11) may be weakened to an inequality. Thus, the 
conditions of Theorem 3 are shown to be necessary for 
a prefix code; their sufficiency is established by Theorem 2. 
and Corollary 2.1. 


bo 


IRE 


Another interesting form of these inequalities is obtained 
by expressing the b; in terms of the a;. To assist in this 
we use matrix notation. We begin by considering only 
exhaustive prefix codes. Let the column vectors A and B 
be defined as follows: 


AL = 
= (il, Oip Idee &¢ 


5) da), 


ky Digs) 2 
Let P(x) denote the number of code letters of cost x, 


and let the (m + 1) X (m + 1 — c,) matrix M be defined 


as 


(—l, M1, Az, °° 


ua ee. 
LO as 
Then, according to Theorem 3, 


MB= A. 


otherwise. 


(12) 


Hxample 3: 


¢ = 1, C= 6, = ™ 
F(@® = 22° + 3e* + 222 +22), G@=e+2424+1, 
k= (SAL ye eee Oe Be=(Valeelelae 
M B= A 
iO .OmnO! eae 
Lee OMA 0 
ee clei AOU iN ee 
0 2 1-111 2 
0: Os 322 3 
Ce Or 0 2 2 


Let M denote the square matrix consisting of the first 


m + 1 — c¢, rows of M, and let A denote the vector 
consisting of the first m + 1 — c, elements of A. Then 
MB =A (13) 

and, if MW is nonsingular, 
M*A =B > 0. (14) 


We shall exhibit 17~’. Let the following recursive sequence 
be defined: 


h> 0. 


f 
Oa | 
& 
Vn-cxy 
k=0 


AWE AN Coser ee ENC 
beginning at h = 0: 


PL 2 onD 


2, the following sequence is defined, 


Woke tla 


In general, v, is the number of possible sequences of code 
letters having cost h. 


TRANSACTIONS ON 
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Lemma 4.1: 


(M~’);; = —Vi-;- 
Proof: Let H be an (m +1 X ¢,) — Gn -— 1 ae 


matrix such that (H);; = —v,_;. We must show tha 
m+1—cr 
(MH);; a pS (M) iH ai = 6j. 
a=1 


Using the definitions of MZ and H, 


(MH) ;; =e paul SU — Do, j-c,° (16) 
k=1 

Case 1:7 — 7 < 0. All the terms in (16) are zero, an 
CVE eee 0. 

Case 2:7 — j = O. All the terms except v,;_; are zeré 
(MH). =v = 1: 

Case 3:71 — j > 0. By (15), the value of (16) is zero 

Thus, (14) is equivalent to (m + 1 — c,) linear ine 


1)st of which is 
7 See a b; 
=! 


These conditions are equivalent to the first (m + 1 — ¢, 
equations implied by (12). The (s + 1)st equation, 


qualities, the (7 + 


S50, (17 


5 > m= css 
= Shae (18) 
k=1 
Substituting (17) in (18), we find that 
t= 50 > ena 
k-1 =1 p=l1 
-> Sse p= pl 
k= p=1 
Reversing the order of summation, we find that 
Sek r 
Oh, = UE, ye ay De pees, 
pl k=1 
and by (15), 
el 
a, =v, — Cle 
(hid 
or 
=» ope (19) 
p=l 


The results of the foregoing calculations are summarized 
in Theorem 4. 
Theorem 4 


There exists an exhaustive prefix code with a, words 
of cost p, and maximum code-word cost m, if and only if, 


fone 1h se eine — eee Da Oeste (20) 
Lore 1 =") ae ere ea = v;. (21) 
p=1 


) 
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prollar, y 4.1: There exists a prefix code having a, words 
pst p if and only if, for 1 < 7 < m, 


: 2) VIO Dee 
7 


(22) 


roof: Given a prefix code I with a USF mace (22), 
ollary 2.1 asserts that the coefficients a, may be in- 
sed to yield the USF of an exhaustive prefix code I’; 
the new coefficients would violate either (20) or (21), 
tradicting Theorem 4. Therefore, (22) is a necessary 
dition. To prove sufficiency, define 


— D Dis Cire (23) 
Le (15), we find that (23) becomes 
= 2 Oe a Oy (24) 
mor; > m, 
hee (25) 


on given a USF satisfying (22), we may increase its 
ficients a; to satisfy (20) and (21) by a process of c¢, 
ys, the kth of which is as follows: increase a,,.,-1 by 
set 6,, equal to zero, and recompute the 6;, 7 > m + 
- 1, according to (25). When the process is completed, 
conditions of Theorem 4 will be satisfied (with m 
efined as the maximum code-word length of the newly 
ived code). Thus, by Corollary 2.1, the original coeffi- 
its a; correspond to a prefix code. 
Ve note that, if (19) holds for m — c, < 7 < m, then 
olds also for gq > m, since 


Sos ae) 
k=1 


imilar comment holds for (22). 


seample 4: 
Mee a 8. 
fen, =, 0) = 1,0, = 2,0; = 497 = 7,05 = 18, 
= ES 
suppose m = 5, a, = 0, ds = 1, as = a, = Os = 2 
en 
Oy A cs ae ON i 


ji, — U101,,— 0, — 1, 
— 00 Uido — id sb, 
— 030, — U2. — 0103 — a, = I, 
bs = Vs — VsQ, — Vz3A2 — V203 — 1104 — As = 1. 
6 conditions of Corollary 4.1 are satisfied. Coefficients 
isfying (20) and (21) are obtained as follows: increase 


oy 1, ag by 2, and a; by 1, ae ae Gp SOR eS 
=) 20, = 3, de = 2, 07 =.1, 


The condition (22) lends itself to a simple combinatorial 
interpretation. Since there are exactly v;_, sequences of 
code letters having cost 7 — p, each code word of cost. 
p is the prefix of v;_, sequences of cost j. By the definition 
of a prefix code, no sequence can have two distinct code 
words as prefixes. Therefore, the number of distinct 
sequences of cost 7 having code words as prefixes is 
>"3_, v;-,a,, and this number may not exceed »;, the 
total number of sequences of cost 7. This argument 
establishes only the necessity of (23); Corollary 4.1 estab- 
lishes its sufficiency as well. 

Recalling the definition of t as the unique positive root of 


a eel See a ON as Aly (26) 
we assert that, for a uniquely decipherable code, 
Bee a (27) 
i=1 


For, if the contrary is assumed, let 
SS tae 
i=1 


hen pean ieee 


0) 


DEP: log. px = 6+ ae, 
7=1 i=1 


contradicting Shannon’s Fundamental Theorem. 

It is of interest to compare (27) with the conditions of 
Corollary 4.1. It is well known (Lagrange [12]; see also 
Dickson [4]) that, for a sequence having the rule of 
formation given by (15), the general term v, may be 
expressed in terms of the roots t, ft, t., ot (26); 
If the roots are distinct, 


cr 

-h 

Oy == > A3b; ce 
i=1 


In the case of a repeated root t;, having multiplicity pu, 
\; 1s replaced by 


(Axo AIP Aish ap ‘ 


Aes 


; “te eae) . 


It can be shown, using a theorem of Cauchy quoted by 
Marden,’ that t, the root of (26) having least absolute 
value, 1s unique, and is the only positive root, provided 
that the ¢, have no common factor. Then 


J iF +: o(! ) 
On+1 _| 


where u is the second smallest root of (26), and 


~ 


é Vi, 
lim = if. 
noo Vast 


Dividing through by v, in (22), one obtains 


3 Marden, [15], p. 95. 
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and, except for the “parasitic”? roots of (26) (d.e., those 
other than t), this would reduce to 


SS ris SS aly, 
p=1 


which is implied by (27). In the case c, = 1, there are 

no parasitic roots, and we can prove Corollary 4.2. 
Corollary 4.2 (Kraft-Szilard inequality): If ¢ = c = 
- = c, = 1, there exists a prefix code having a; words 

of cost j, and maximum code-word length m, if and only if 


(28) 
Proof: In this case », = rv,-. = 7”, and (22) reduces to, 


for. <9 S77; Diy tas <7 
p=l1 


or, equivalently, 


fote sje 


i 
| De aa 
Daa 


which is equivalent to (28). 


IV. An IntTEGER PROGRAMMING MrtHop 
FOR CONSTRUCTING OPTIMUM CoDES 


In the previous two sections, various necessary and 
sufficient conditions for the existence of prefix codes have 
been obtained. In this section, these results are employed 
in the construction of optimum prefix codes. The problem 
may be stated as follows: 


Minimize 


subject to, 


Dorel Oe On <0 


for, ly<cji< im) and, 1b; bea) 
k=1 
or, equivalently, 
LOR leery enres iG On (29b) 
p=1 


In order to make this a well-stated minimization problem, 
it is necessary to express the relations between the vari- 
ables /;, which give the costs of specific code words, and 
the a;, which tell how many code words of a given cost 
occur. In the special case when p, = po = --: = pp, = 1/n, 


. i<. 

Da Pili = — Dy jas, 

t=1 NM 7=1 
and (29), together with the constraint 


m 
DE ats 
a! 


Janua 


is sufficient. More generally, we may define 


\ if l=] 
Ce = 
0 otherwise. 
Then C 
l; ine Be. a= Do ysis 
7=1 t= 


and the problem becomes 
min >) >> pijyss (3 
i=1 7=1 


subject to 


06 = Os een) 


n 


Dy Yas 


1=1 


Gla 


1 <7 <n, =D Dg ea : 
p=1 i1=1 


and : 


ja PESO +b< >> Oe 
k=1 


for 
or equivalently 


for 


m 


¢ 


together with the requirement that all variables y;; an 
b; be non-negative integers. 

The problem has now been cast as the minimization 0 
a linear function subject to linear constraints, togethe 
with the requirement that the variables assume intege! 
values. Gomory [7|-{9] has developed a computationa 
method called integer programming for solving problem: 
of this type.* In principle, this method can be appliec 
here, once an upper bound m on the costs of code word; 
has been set. Since mn integer variables y;; are involved 
however, such an approach would permit the solution o 
only very small problems. To extend the practical scop¢ 
of the method, it is necessary to reduce the number o 
variables actually needed in the computation. This prob 
lem will be considered next. 

Let the X,; be so ordered that, if 7 < k, p; > p,. Ther 
there exists an optimum code such that J; < 1, wheneve' 
7 < k; such codes will be called monotone. All optimun 
codes are monotone unless some of the p; are equal 
Let @(z, 7) be a lower bound on the cost of a monotone 
code such that y;; = 1. Suppose y;; is set to zero in (30 
and (31) whenever @(z, 7) > @o, and the resulting problen 
in a reduced number of variables is solved. Then, if thi 
solution obtained has cost less than or equal to @o, i 
is the optimum solution to the original problem. If thi 
@(z, 7) are sharp lower bounds, and @, is properly chosen 
the number of variables in the integer programming calcu 
lation may be reduced greatly. 


{ORs tere rte 


* The author is indebted to Dr. R. E. Gomory for helpful dis 
cussions of the theory and practice of integer programming, an 
for making his IBM 704 program available. 
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e recall that the condition 
Dei sa eal 
1=1 p=1 


(32) 


ecessary for the existence of a prefix code, and that 
constraint 


oD Apt” < i 
p=1 


(ie by (382), closely approximates the jth linear con- 
bint in (29b), with an exponentially vanishing error. 
eed, we have found empirically that the replacement 
loo) by (32) changes the minimum value of >>"_,p,l; 
le, if at all. Thus, a sharp bound @(2, jo) can be found 
solving the following problem, which will be shown to 
ire very little computation: 


min >> pil; 
Det 


ject to 
= <1, (a) 
fore 7 <a) hy (b) 
(OL te ata bee tG 5; (c) 


ere (b) and (c) are required for monotonicity. Here, 
; simply a constant which may be obtained from (26) 
standard numerical procedures. 

Chis problem can be converted to an integer program- 
ng problem by defining 


en 
ee a (1 a t) DS fa tes is a 
7=1 
uking these substitutions, one obtains 


n m 
min 3 Dip 
~=1 g=1 


ject to 
ve, 2S (33) 
a=1 7=1 ot t 
for v a oy i = Jo) L335 = 0 (34) 
for 4 > 1%, LeSae ite dhe (35) 


In order that the x,;; may meaningfully define the J;, 
is necessary that, whenever 7, < jo, %i3, > %;,. Hi, 
wever, a solution to the above problem had 2;;, = 0, 
| = 1, jp > ji, a solution having lower cost could be 
tained by interchanging the values of x;;, and 2;;,; 


therefore, the optimum solution to the above problem 
will automatically satisfy this requirement. 

Apart from upper bounds on the values of the non- 
negative variables x;;, and conditions which give pre- 
assigned values to the x;;, (33) is the only constraint to 
be satisfied. Integer programming problems of this type 
are called knapsack problems, and are relatively easy to 
solve [3]. If the constraints are weakened to admit non- 
integral values for the x;;, the solution becomes particu- 
larly simple. The variables x;; which do not have pre- 
assigned values are set equal to | in increasing order of 
the quantity p;t’ until, after a variable x,, has been 
set equal to one, (33) is satisfied. The value of x,, is then 
decreased so that (33) is satisfied as an equality. 


Example 5: Let n = 10;r = 2,¢, = 1, = 3. 
Py te= 0; t =10,082: 
Let the p, be 0.23, 0.20, 0.16, 0.11, 0.08, 0.07, 0.05, 0.04, 


0.03, 0.03. We shall determine @(2, 5). In the matrix 
of Fig. 2, the 7, 7 element corresponds to the variable 2;;. 


Sd 
q 


1 
2 
3 
4 
5 
6 
7 
8 
9 


pi 
Oo 


Fig. 2—Matrix used in computing @ (2,5). 


The dotted area corresponds to those variables having 
the preassigned value zero, according to (34). Variables 
in the cross-hatched area have the preassigned value 1, 
according to (35). The remaining elements are numbered 
in the order of their inclusion in the solution. The con- 
straint (33) is first satisfied when 77, is set equal to 1; 
it is satisfied as an equality when 27, = 0.33. The following 
solution is thus obtained: 


ila = 4, iB = i — ib = oF ths = 6, 
ie a ie ihe = 7.38, lhe = 8, ll = hig — 9, 
Dd pili = 5.46 


Therefore @(2, 5) = 5.46. 

Once the cells of the matrix have been arranged in in- 
creasing order of p;-t’, little calculation is needed to 
determine @(7, 7) for any values of 7 and 7. 
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When p, is small, 7 can be made very large without 
causing a large increase in the value of @(z, 7). However, 
if A, is an upper bound on 1;,, — l;, for an optimum code, 
we may define the following improved bound: 


C(g, h). 


= min 


7—Ai-gShSi 


Cr(t579) max 


g9>0 


This bound will be of value principally when 2 is close to n. 
We state the following result without proof: 
Ifk = (r — 1)Q(k) + R(k), 0 < R(kK) < r — 1, then 


(QU + ier HF Cr-1-3 


A; max = Cees 


ToS OORT 


It remains to discuss the choice of @y. Clearly, Co 
should be large enough so that there exists a code of cost 
less than or equal to @,, but otherwise as small as possible. 
In the examples treated thus far, a near-optimum code 
was constructed by cut-and-try procedures, and @, was 
taken as the cost of that code. A more systematic procedure 
would be based on the empirical observation that optimum 
codes nearly always have efficiency quite close to one. 
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Thus, one can compute 


Cam = DP: log, Pi» 


and set €, equal to @,;,/H, where E is, say, 0.99. 
a code having cost less than @, is not found, a smalle 
value of # must be tried. | 
The success of the techniques for eliminating variable 
is best evidenced by the results of the integer program 
ming calculations that have been carried out, using al 
experimental IBM 704 program. In these calculations, t 
constraint (381b) was used, rather than (81a), so that th 
b; did not appear explicitly as variables. Also, if J; hag 
the possible values j,, jo, , Js: Yii, WAS representet 
as 1 — Yij, — Yin °°° — Yai, The results are given i 
Table I. . 
For codes 3 and 4, the alphabet encoded was the Roma 
alphabet plus “space,” using the probabilities given bj 
Brillouin’ in 1956. Table II gives the costs of the cod 


: 
‘ 


5 Brillouin, [2], p. 52. 


TABLE I 
SUMMARY OF COMPUTATIONAL RESULTS 


Code Number 1 2 3 4 
Costs of Code Letters iD il. % le Zo pind) 
n 10 12 Di Qe 
Numbers of Variables 10 9 14 61 
Gram =) DCs (On loys fs 5.4164 4.8154 5.8270 6.6829 
Cost of Solution 5.44 4.85 5.8599 6.7324 
Efficiency of Solution 0.9957 0.9929 0.9944 0.9926 
Computer Time Used 1 Minute 1 Minute 1 Minute 5 Minutes 
TABLE II 
Two Optimum CopEs FoR THE RomMAN AupHaBer Pius ‘“‘Spacn”’ 
CODE 3 CODE 4 
Probability Cost of Code Cost of Code 
Letter of Letter Word Code Word Word Code Word 
Space 0.200 3 Is A al 4 2 2 
E 0.105 5 Pak ike 5 By 2 
at 0.072 6 ik al BB 6 Bees! 
O 0.0654 6 ee ey 6 SHO 
A 0.063 6 De leaeiaee? 6 By By 
N 0.059 6 ly PR Pe aD iG Bh B38 
I 0.055 6 2 Aly PAs 7 2 2 
R 0.054 6 Phe Mil eeral a 3) eee 
S 0.052 6 a ee ale | 8 2 Bs BY 
H 0.047 6 lp 2a el lca 8 BOR 
D 0.035 iu ik Bik th 8 @ 2 at 
L 0.029 7 Di, Wee Pee Al 8 eB & 2 
C 0.023 8 2 ee sD eat 9 Se a oF 
F 0.0225 8 DPN ye Pde NOs I 9 mo oat 
U 0.0225 8 2 el 10 ey BG 
M 0.021 8 ib al ee ah OF al 10 BB Bh & 
Pe 0.0175 9 2 A 2D ge: 10 Py BN BS DY 
W 0.012 9 P92 WDD 11 PB 8 BY 
vi 0.012 9 Ik al eb OF 11 Yi BAS RYE BS 
G 0.011 9 Dies BF pa vil 11 Dy Be BY BN 
B 0.0105 10 I PF OX DB I 11 PR. BY ORB 
iv 0.008 11 PA OL D2 OF De 11 Whe Te Bh, Wee 
K 0.003 12 De ee a, 14 ye) Be B Bs 
x 0.002 13 2 CDPD ESO ane ee 14 Pi yO) = Bi. BY 
J 0.001 14 Di Py ROA IX Ml 15 yA By BB OM 
Q 0.001 14 LOD Pe A OX ~ I 16 2 OL ONES ne 
Z 0.001 15 PPS DP D2. oil, oP), 23 16 7} 8) So SB BD BY 


s in these codes, and gives specific prefix codes 
zing these costs. In code 3, the code letters are 
ted by 1 and 2; in code 4, they are denoted by 2, 3, 
3’. Code 3 may be compared with one of cost 5.8629, 
‘ined by Marcus [14] in 1957 using a partially system- 
procedure. 


V. FurTHER PROBLEMS 


ections II and III give several alternate algebraic 
tacterizations of prefix codes. In contrast, an algebraic 
racterization of the larger class consisting of all 
yuely decipherable codes has not been achieved, al- 
ugh Sardinas and Patterson [17] in 1953 have given 
mite procedure for deciding whether a code belongs 
his class. We conjecture that, for every uniquely 
ipherable code, there exists a prefix code having the 
ie USF; Marcus [14] has made a somewhat similar 
yecture. 

urning to computational considerations, we note that, 
sn the costs of code letters are high, and the channel 
acity therefore low, an inordinately large number of 
jables is required in the algorithm of Section IV. This 
y be remedied by introducing smaller integer costs 
ch are approximately proportional to the given costs, 
_ solving the resulting problem. This technique, how- 
r, yields only an approximate solution. Special tech- 
ues have been developed for obtaining strictly optimum 
es in this case, but their investigation has not yet been 
pleted. 

everal problems of interest result when further con- 
ints are imposed on the coding problem. For example, 
practical communication systems, it may be desirable 
impose an upper bound on the costs of all code words. 
e coding problem with this constraint can be solved 
using the algorithm of Section IV; in fact, the externally 
yosed upper bound simplifies the process by determining 
parameter m in advance. Huffman’s method of con- 
icting optimum codes, which is applicable when all 
e letters have equal cost, does not seem readily adapt- 
e to this constrained problem. Thus, even in the case 
qual costs, the algebraic approach has the advantage of 
led flexibility. Table III gives length distributions for 
» encodings of a twenty-seven letter alphabet, with 
babilities given by Gilbert and Moore [6] in 1959 for 
inary channel with c, = c, = 1. The first is the optimum 
e without constraints, and has cost 4.1195; the second 
optimum subject to the constraint that, for all 2, 
< 7, and has cost 4.1490. 

‘tructures isomorphic to those of prefix codes occur 
many problems of classification and identification, 
sing in such diverse fields as sorting, character recogni- 
1, and medical diagnosis. Such problems have been 
sidered by Schiitzenberger [18] in 1954 and Moore’. 


BE. F. Moore, private communication; July, 1959. 
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As an example, suppose we are given the probabilities 
of occurrence of a set of objects, and a set of tests, having 
known outcomes for each object, which may be applied 
to determine the identity of an unknown object. Such a 
situation is specified in Table IV, in which each test has 
the two possible outcomes, zero and one. 


TABLE III 
ReEsu.t oF CopInG witH CoNsTRAINTS 


Length in Length in 
Unconstrained Constrained 
Code Code 


Letter Probability 


1859 
1031 
0796 
0642 
0632 
0575 
0574 
0514 
0484 
0467 
0321 
0317 
0228 
0218 
0208 
0198 
0175 
0164 
0152 
0152 
0127 
0083 
0049 
0013 10 
0008 10 
0.0008 10 
0.0005 10 


Space 


PYQYQOVOS Sooo SoS SSeS SsaoeooS 
BND DD DHD HDHD OVOvSvSr Se PE EE PB PE WO 


NOAH ASBVOK SS SOCOM ER eArorys 


SNNN NN AARAAAMMWHOOOME EL PEE WW 


TABLE IV 
Object Probability Tests 
Xe; De ab @€ a @ i 
Xi 0223 ih il ak @) @ © 
X2 0.20 toi O@ © it © 
X; 0.16 Oil © O i dl 
X4 0.11 IO ak Oil 
X5 0.08 LO O © a i 
X65 0.07 OO al) Sale al ah 
X7 0.05 oO Os ih 2b a 
Xs 0.04 LL Ls Ongy ees! 
X 4 0.03 WO the iD a 
X10 0.08 il A ale geal Ge al 


An identification procedure for this example is shown 
in Fig. 3. At each node of the tree a test is performed, and 
one branch or the other is followed, depending on the out- 
come of the test. Unique identification is achieved when a 
terminal node is reached. The expected number of tests 
required for an identification is }> p;l;, where J, is the 
length of the path leading to the terminal node associated 
with X,. 
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Fig. 3—Tree for an identification procedure 


Suppose that the optimum length distribution for the 
given alphabet {X;} has been found. Then it is not difficult 
to determine whether this length distribution is realizable 
by the given set of tests. If it is not realizable, the 2nd 
best, 3rd best, --- , kth best, --- distributions may be 
considered in turn. This is done using integer program- 
ming by inserting at each stage a constraint which rules 
out the last distribution obtained, and no other. The 
incremental cost of obtaining each such solution, once 
the first has been obtained, is not great. Of course, if 
the given set of tests is not highly effective, many distri- 
butions will have to be inspected, and the method will 
not be feasible. It is hoped that further investigation 
along these lines will yield improved methods for treating 
classification problems. 


[1] 


[14] 


— 
or 
= 


[18] 


[19] 


|] G. B. Dantzig, draft of “Discrete variable extremum problems 


] L. E. Dickson, “History of the Theory of Numbers,” Carneg 
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| Ek. N. Gilbert and E. F. Moore, ‘‘Variable length binaj 


] R. E. Gomory, “Outline of an algorithm for integer solutio 


| R. E£. Gomory, ‘‘An algorithm for integer solutions to line 


] D. A. Huffman, “A method for the construction of minimu 
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Note on Signal-to-Noise Ratio in Band-Pass Limiters* 


CHARLES R. CAHN}, MEMBER, IRE 


mmary—A simplified analysis is presented to explain physically 
hange of signal-to-interference ratio which occurs in a band- 
limiter, The analysis utilizes the concept of sideband resolution 
symmetric and anti-symmetric parts and considers only the 
ptotic case where the signal-to-interference ratio is small in 
arison with unity. Wide-band correlation-detection systems 
iscussed, as well as ordinary band-pass systems. 
e important conclusion is reached that the degradation is 
y dependent on the statistics of the interference amplitude 
ations. However, when the signal is weak compared to the 
erence, the maximum possible degradation is 6 db and occurs 
mstant-amplitude interference. 
>gradation with noise interference in a wide-band correlation- 
ction system has been obtained for arbitrary signal and noise 
lwidths. It is found that the degradation ranges between 0.6 db 
1.0 db, the latter figure being for the case where the signal 
lwidth is greater than approximately three times the noise 
lwidth. 


INTRODUCTION 


NHE EFFECT of an ideal band-pass limiter on 
signal-to-noise ratio has been rigorously analyzed’ 
~ for the case of a sine wave in Gaussian noise inter- 
ace. The results indicate that the input signal-to-noise 
o is degraded only by a numerical factor close to unity, 
maximum degradation being 4/z (1.0 db) for signal- 
oise ratios much less than unity (0 db). This basic 
It has been extended’ to allow calculation of the effect 
he band-pass limiter on signal detectability, assuming 
reak signal-to-noise ratio at the limiter input and 
issian noise interference. The results show that the 
ter produces a very small loss in signal detectability; 
example, the factor is only 1.16 (0.6 db) for a wide- 
d rectangular noise spectrum. 
he analyses referred to in the above paragraph do 
admit an easily grasped physical picture which shows 
rly why the desired signal in the presence of strong 
rference is not highly suppressed by the nonlinear 
on of the limiter. Furthermore, the analyses are valid 
‘ for Gaussian noise interference and do not present 
scted performance for arbitrary interference statistics. 
irther limitation is the restriction that the interference 
dwidth be much wider than the signal bandwidth. 
, is the purpose of this paper to present a simple 
lysis which, while applicable only for the case of a 
al much weaker than the interference, does provide 
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W. B. Davenport, “Signal-to-noise ratios in band-pass limiters,”’ 
ppl. Phys., vol. 24, pp. 720-727; June, 1953. 
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stability in band-pass limiters,’ IRE Trans. on INroRMATION 
ory, vol. IT-4, pp. 34-38; March, 1958. 
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a simple physical picture and does not have the limitations 
mentioned above. The analysis will lead to a degradation 
factor which relates the output signal-to-interference ratio 
to the input ratio. However, as in the analyses men- 
tioned,” only long-term averages are considered. It is, 
of course, recognized that short-term fluctuations can be 
significant in many practical situations where the ap- 
plication of a limiter might be considered. In addition, 
the interference is assumed noncoherent with the signal. 
A special analysis is required to treat coherent inter- 
ference, which can greatly suppress the desired signal in 
certain cases. 

The use of signal-to-interference ratio as an indication 
of system performance is common, in practice, for sim- 
plicity and is adopted here for this reason. It is, of course, 
true that the problem of signal detection has been studied 
extensively, and a more accurate and general statistical 
detection theory has been evolved to replace the simple 
criterion of signal-to-interference ratio.” 


CasE or Two SINE WAVES 


The case which serves as the basis for simple anal- 
ysis of more general inputs is that of two sine waves. 
For this case, the limiter input may be expressed as 
A,[cos w,t + a cos wot], where a denotes the amplitude 
ratio and is less than unity. The output of an ideal band- 
pass limiter, which removes the amplitude modulation 
without distorting the phase modulation,* is obtained by 
dividing the input by the instantaneous envelope, yielding 


A, [cos w,f + @ COS wot] 
AGA/ (et a? 220 Cos Ga ens 


terms proportional to 
higher powers of a. 


a 
COs wt + 5 COS Wot 


- cos (2w,; — ws)t + (1) 
It is seen from (1) that the limiter suppresses the weaker 
signal, relative to the stronger signal, by an amplitude 
factor of 2 (6db) and produces cross-modulation com- 
ponents, only one cf which is of significant amplitude. 
In addition, the phase of each of the two signals is not 
affected by the limiter. 

The above result, obtained by a series expansion method, 
is more easily established by considering the weaker 
signal as a sideband of the stronger signal and using the 
concept of symmetric and anti-symmetric sidebands.° It 


3D. Van Meter and D. Middleton, ‘‘Modern statistical ap- 
proaches to reception in communication theory,” IRE Trans. on 
INFORMATION TuHHorRy, no. PGIT-4, pp. 119-145; September, 1954. 

4W.B. Davenport and W. L. Root, “Introduction to the Theory 
of Random Signals and Noise,’ McGraw-Hill Book Co., Inc., 
New York, N. Y., p. 288; 1958. 

5§. Goldman, ‘Frequency Analysis, Modulation and Noise,” 
McGraw-Hill Book Co., Inc., New York, N. Y., pp. 167-181; 1948. 
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is easily shown that the symmetric sidebands produce 
amplitude modulation only, and to a first approximation 
for weak sidebands, the anti-symmetric sidebands produce 
phase modulation only. Since the ideal band-pass limiter 
suppresses the amplitude modulation, only the carrier 
and the anti-symmetric sidebands are retained in the 
limiter output. The first-order terms on the right side 
of (1) are observed to be exactly these components. This 
approach may be generalized to include a multiplicity of 
frequency components about a single strong carrier. Each 
component is found to be independently affected by the 
limiter to the first approximation and, accordingly, is 
suppressed by an amplitude factor of 2 and is unchanged 
in phase. 

Since the relative amplitudes and phases of the various 
frequency components of the desired signal are unchanged 
despite strong sine wave interference, the significant obser- 
vation is made that the limiter output contains an un- 
distorted replica of the desired signal. In fact, this conclu- 
sion applies even if the sine wave interference is phase 
modulated, and explains physically why the desired signal 
is not highly suppressed by the strong interference. 


CASE OF A SINE WAVE AND STRONG GAUSSIAN 
Noise INTERFERENCE 


If the interference is Gaussian noise and strong compared 
to the sine wave, it may be considered as a modulated 
carrier. The input to the limiter then may be expressed as 


input = A, cos (w,t + 6,) + V/28 cos Wot, (2) 


in which the phase 6, of the noise interference is random 
and the amplitude A, has a Rayleigh distribution. For 
simplicity, #, and w, are assumed unequal, although a 
more complicated argument can be used if w, = a, to 
yield the same result. For a fixed noise amplitude 4,, 
the sine wave, being a weak sideband, gives rise to two 
output sine waves of equal amplitude, one of which is 
at the frequency w, and in phase with the input sine wave. 
The other, at the frequency 2w, — w., has the random 
phase of the noise. Although both output sine waves 
have the amplitude S/ 2 A,, there will be no average 
output at the frequency 2w, — w., because of the random 
phasing. 

The average amplitude (or steady component) of the 
sine wave output at the frequency », is obtained by 
averaging over all possible noise amplitudes, as follows: 


Average sine-wave amplitude = : ee 
i < en An/2Nn as us 8S 
WL aOR © 


where the bar denotes an ensemble average and N is 
the average power of the interference. The output noise 
essentially has a power of 3, since the limiter output is 
a phase-modulated sinusoid of unit amplitude and the 
desired signal component is much smaller than the noise 
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component. Thus, the orgs signal-to-interference ratio 
vs Se 
(Vz T (4) 


GF. = 1/2 N/, ( 


which is identical with the result obtained by Daven 
by a rigorous treatment of this case. 

It should be noted that with Gaussian noise intl 
ference, the average sine wave amplitude in the limite 
output can be evaluated in closed form for an arbitrar 
signal-to-interference ratio directly from (2) divided b 
the instantaneous envelope.°’’ Since the total outp 
power from the limiter is always 3, the output signal-te 
interference ratio can also be expressed in closed forn 
However, a similar closed-form result for non-Gaussia 
interference does not appear to exist, in general. 


CASE oF A SINE WAVE AND STRONG 
Non-GAvuSSIAN INTERFERENCE 
The calculation made for Gaussian noise (Rayleig 

distribution of amplitude) can be generalized to includ 

interference with an arbitrary amplitude distribution, o 

the assumption that the output interference power 1 

much larger than the output signal power. From (3) an 

the fact that the output interference power is essentially 3 

the output signal-to-interference ratio is found to b 


By _ S q= : 
(x), "Pay “ 
On the other hand, the input signal-to-noise ratio 1 
S| we 2s 
is hue e « 


Therefore, the degradation in signal-to-interference rati 
due to the limiter is given by the factor 


CS/N) in = 4 2 
CSN eatin A) 

As an example, (7) may be used to derive a closed-forr 
expression for the degradation when the interference is 
combination of a steady component and a Gaussian nois 


component. The amplitude of this combination has th 
probability density function 


P(A.) = A,e | 


= 


(7 


(A, pea p 


o( V 2yA,) (s 


where y is the power ratio of the steady component an 
the noise component. If the averages in (7) are evaluatec 
the result is obtained as follows: 


V/A = Fy t+ le Toly/2)P, ( 


which is graphed in Fig. 1. 


®N. M. Blackman, “The output signal-to-noise ratio of a powe 
law device,” J. Appl. Phys., vol. 24, pp. 783-785; June, 195 

ls iS) Reed, ‘An Approximation to the Output Signal-to-nois 
Ratio of a Signal, Passed through a Band-Pass Limiter, Followe 
by a Narrow-Band Filter,’ Lincoln Lab., Mass. Inst. Tech., Le: 
ington, Group Rept. 47. 17; June 2, 1958. 
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1—Degradation for a combination of steady and Gaussian 
interference. 


he graph in Fig. 1 shows how the degradation due 
ne limiter increases to a limiting value of 6 db as the 
uation in the interference amplitude decreases. The 
st possible degradation (6 db) actually occurs for 
rference with a constant amplitude, as may be verified 
ipplication of the Schwarz inequality.* The proof con- 
of verifying the successive inequalities, 


A.) = A, AS) > 1. (10) 
inequalities become equalities only if 
Az = (A,)’; (11) 


sh means that the variance of the probability distri- 
on of amplitude is zero. This occurs only if the 
litude is constant. On the other hand, there is no 
r limit; that is, probability distributions with the 
ropriate behavior at A, = O render (7) arbitrarily 
ll.” This nonmeaningful result arises from the ap- 
imation that the interference should always be strong 
pared to the sine wave. Furthermore, the signal-to- 
rference ratio is not an accurate criterion of system 
ormance in such a case. 

nee the interference is assumed to be noncoherent, 
average output exists at the frequency 20, — as, 
this term is not considered further in the discussion. 


WipdE-BAND CoRRELATION SYSTEMS 


communication system can utilize a wide-band signal 
. a complex waveform, the objective being to trade 
1width for interference reduction. Such a system may 
ze correlation detection as an effective narrow-band 
ption technique.’’''’ While correlation detection with 
atched reference is an optimal procedure for signals 


Birkhoff and MacLane, “‘A survey of Modern Algebra,” The 
Millan Co., New York, N. Y., p. 183; 1948. The proof using 
chwarz inequality was initially developed by Dr. R. E. Graves, 
> Tech. Labs. 

For example, distributions with a behavior at zero amplitude 
e form A,*, where a < 0, cause the second average in (7) 
divergent. 

P. E. Green, Jr., ‘ The output signal-to-noise ratio of correla- 
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0-18; March, 1957. 

R. Price and P. E. Green, Jr., A communication technique 
wultipath channels,’ Proc. IRE, vol. 46, pp. 555-570; March, 


corrupted by additive white Gaussian noise,” it still 
often is used in practice when interference of a more 
general nature is present. For practical reasons related 
to gain control, a limiter can be incorporated in the 
receiver, as indicated in Fig. 2, which illustrates the basic 
coherent type of correlation processing. It is desired to 
calculate the degradation introduced by the limiter, 
determined by comparing the output signal-to-interference 
ratios obtained with and without the limiter. The restric- 
tion is made that the interference is much stronger than 
the signal, which is the case in practical situations where 
wide-band signals are used for interference reduction. 


BAND-PASS 
LIMITER 


PRODUCT 
DEMODULATOR 


SIGNAL PLUS 
INTERFERENCE 


LOW-PASS 
FILTER 


MATCHED SIGNAL 


Fig. 2—Coherent correlation system. 


Since the assumptions that the interference is strong and 
the signal is wide-band usually ensure that the output 
interference from the narrow-band low-pass filter is 
essentially Gaussian even when the limiter is used, the 
calculated signal-to-interference ratio degradation can be 
interpreted directly as a performance degradation. 

The input to the receiver may be expressed as s(t) + 
n(t), where s(t) is the desired signal waveform and n(f) 
is the interference waveform. The calculation of output 
signal-to-interference ratio will be performed first in the 
absence of limiting. Then the product demodulator output 
is s(t) + s(t)n(t), the first term of which is the output 
signal and the second is the output interference. Thus, 
the average or de output is 


de output = s(t) = Sy (12) 


using S for the average power of the desired signal s(t). 
The output interference term s(t)n(t) is the product of 
two independent waveforms, so that its autocorrelation 
function is the product of the individual autocorrelation 
functions. The power spectral density of the output inter- 
ference can therefore be obtained by convolution of the 
respective power spectral densities, S(f) and N(f), of the 
input signal and interference. Since the output interference 
spectrum is essentially constant over the significant pass- 
band of the low-pass filter, only the zero-frequency value 
N u+(0) is needed and is given by 


Nou) = [| SON at, (13) 


in which one-sided spectral densities are utilized. 


2 P.M. Woodward, ‘Probability and information Theory with 
Applications to Radar,’ McGraw-Hill Book Co., Inc., New York, 
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The output interference power is equal to the product 
of N.y.(0) and the noise bandwidth b of the low-pass 
filter.’ Thus, the output signal-to-interference ratio is 


given by 
Y y2 
( =) S 
N out a 


: i S(DN(f) af 


(14) 


which is a special case of (9) of Green’s article.’° In 
addition, fluctuations of the signal term s*(t) in the 
product demodulator output will also be present in the 
output of the low-pass filter. This fluctuation has been 
called self-noise, despite the fact that it is completely 
predictable from the wave-form s(t)."° However, when 
the interference is strong, the self noise is negligible, and 
for this reason has not been included in (14). 

When the interference spectrum is uniform, V(f) = No, 
(14) may be shown to reduce to the well-known formula 
for the peak signal-to-noise ratio from a matched filter.”* 
To demonstrate this, the low-pass filter is assumed to 
be an ideal integrator with a rectangular impulse response 
of duration 7." The noise bandwidth of this filter is 
easily calculated to be 1/27, so that (14) becomes 


Ss OTtSe 
N out vn a ae 
No [ S(f) af 
0 


27'S 
No 


2H 
saNig 


(15) 


where 4 = T'S is the energy of the signal over the time 
interval T. 

When limiting is performed on the input, the output 
signal-to-interference ratio becomes dependent on the 
amplitude distribution of the interference. Since the inter- 
ference is strong, it is convenient to express the desired 
signal as a superposition of various frequency components 
and the interference as a modulated carrier with amplitude 
A,, as in (2). Comparison of (2) with the first part of (3) 
shows that the limiter reduces the amplitude of each 
ey component of the desired signal by the factor 
A,'/2, so that the useful de output amplitude is also 
reduced by this factor. The output interference power has 
only a negligible contribution from the desired signal and 
may be evaluated from knowledge of the spectral density 
of the interference at the limiter output by an integral 
similar to (18). This distorted spectral density will be 
denoted by N,(f) and has a total power content of 4 
The degradation in output signal-to-interference ratio 
produced by the limiter then may be expressed as 


1 J. L. Lawson and G. E. 
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McGraw-Hill Book Co., Inc., 
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(SN uate 4 i SDN) af 


(S/N) himiter 


A = = = . 
(A;)? if S(DN(A af 


It should be emphasized at this point that the degradati 
A defined by (16) is not, for general interference spectr 
a degradation from an optimum detection process, bt 
only an expression of the effect of a limiter in a syste 
using correlation detection with a matched referene 
If S(f) is essentially constant with the value S(fo) ov 
the significant portions of N(f) and N;(f) (narrow-bam 
interference), (16) may be approximated as 


4st) [Nuff af , 
AS “See q 
(As)? S(fo) wee N(f) af 


since the integral in the numerator is simply 3, and t 
integral in the denominator is A;/2. Eq. (17) is identic 
with (7). Hence, when a limiter is employed, the maximur 
degradation with narrow-band interference is 6 db, fo 
lowing the same argument used in connection with (7; 
and occurs with constant-amplitude interference. Actuall 
this conclusion is independent of interference bandwidt 
when the interference amplitude is constant, since NV ;( 
is proportional to V(f) (no distortion), so that (16) reduce 
to a value of 4. 

The only other example which will be treated in deta 
is the case where the interference is Gaussian noise an 
both noise and signal have rectangular spectra with th 
same center frequency fy. The bandwidths are denote 
by By and Bs, and the powers by N and S, respectively 
The spectral density N_,(f) of the interference at th 
limiter output is indicated in Fig. 3, using a result o 
Price.'° The average of A>* in the denominator of (16 
may be evaluated as shown in (38). The value of the inte 
gral in the denominator is either SN/Bs or SN/By 
depending on whether Bs > By or Bs < By. It is thei 
found that the expression for the degradation due to th 
limiter is either 


8 fotBs/2 
aif" wipdt Cs> Bo). aM 
or 
SB, fotBsg/2 
N= Be ee N1(f) af (Bax Bae (19 


Note that the integral in either (18) or (19) specifies th 
total output interference power contained within the band 
width By of the desired signal. 


16 R. Price, “A note on the envelope. and phase-modulate 
components of narrow-band Gaussian noise,’ IRE Trans. 0: 
INFORMATION THEORY, vol. IT-1, pp. 9-15; September, 1955. 
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Fig. 83—Output spectrum of band-pass limiter. 


‘wo extremes may be considered. First, let the signal 
dwidth be much larger than the noise bandwidth, so 
t the “tails” of N(f) are included in the integral of 
). The integral then is simply 4, and A = 4/z (1.0 db). 
ond, let the noise bandwidth be much larger than the 
ial bandwidth, so that (19) becomes 

A= eS B.(0:456/8y)-= 1.16 (20) 

a Bs 

).6 db, the result obtained by Manasse, Price, and 
ner, when the signal is assumed at the center of the 
se spectrum. The transition between the two extremes 
hown in Fig. 4. It is interesting to note that the least 
radation occurs when the signal and noise have the 
e bandwidth, although from a practical point of view 
variation with bandwidth is very small. 


CONCLUSIONS 


he simplified analysis of the effect of an ideal band- 
; limiter on signal-to-interference ratio has led to 
Its in agreement with those obtained by more rigorous 
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Fig. 4—Degradation in correlation system with Gaussian Inter- 
ference. 


techniques. In addition, the method obtains useful answers 
for non-Gaussian interference and may be applied to 
both ordinary band-pass systems and wide-band correla- 
tion-detection systems. 

The most important conclusion is that the degradation 
in signal-to-interference ratio due to the presence of a 
limiter is quite dependent on the statistics of the inter- 
ference amplitude. However, when the signal-to-inter- 
ference ratio is low, the worst degradation (6 db) results 
for constant amplitude interference in both ordinary band- 
pass and wide-band correlation-detection systems. Ampli- 
tude fluctuations will reduce the degradation considerably; 
for example, the degradation with noise-like interference 
is about one db. The degradation (in db) may be expected 
to be negative for highly fluctuating interference (for 
example, impulsive interference). This indicates an impor- 
tant practical reason for incorporating a limiter into a 
communication system. 

Finally, the result is obtained that with Gaussian noise 
interference the degradation due to a limiter in a wide- 
band correlation-detection system does not vary signifi- 
cantly with interference bandwidth. Previous analyses 
have been restricted to the case of wide-band inter- 
ference only. 
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Recognition of Membership in Classes” 


GEORGE S. SEBESTYENf, MEMBER, IRE 


Summary—This paper presents an approach to the general 
problem of recognition of membership in classes which are known 
only from a set of their examples. A geometrical approach is taken 
where membership in classes is regarded measurable by metrics 
with which a set of points, representing different members of the 
same class, may be brought ‘‘close’’ to one another. For the case 
where classes are Gaussian processes, the method described 
herein and that of decision theory are found to agree. A practical 
application of the method to the ‘automatically ‘‘learned”’ recogni- 
tion of spoken numerals is described. 


INTRODUCTION 


S the advances of modern science and technology 
furnish the solutions to problems of increasing 
complexity, a feeling of confidence is created in 

the realizability of mathematical models or machines 
which can perform any task as long as a specified set of 
instructions can be given stating how the task is to be 
performed. There are, however, problems of long-standing 
interest which have hitherto eluded solution, partly be- 
cause the problems have not been clearly defined, and 
partly because no specific instructions could be given 
stating how a solution should be reached. Recognition 
of a spoken word independent of the speaker who utters 
it, recognition of a speaker regardless of the spoken text, 
the problem of threat evaluation, that of making a medical 
diagnosis, and that of recognizing a person from his hand- 
writing are only a few of the problems which so far re- 
mained largely unsolved for the above mentioned reasons. 

In all of the problems of pattern recognition, however 
different they may seem, there is a common bond that 
unites them and permits their solution with identical 
methods. The common bond is that the solution of these 
problems requires the ability to recognize membership 
in classes, and, more important, 1t requires the automatic 
establishment of how to measure membership in each 
class. In word recognition, the class is a specific word of 
interest and members of the class are different utterances 
of the word by different speakers. If membership in the 
class could be recognized, then the word, independent of 
the speaker, could be identified. Similarly, ‘“‘speech by 
a given speaker” or “handwriting by a given person” 
are classes in which membership must be recognized in 
solving the problems listed above. In a similar manner, 
the rendering of a medical diagnosis is the recognition of 
the patient as a member of the class of individuals having 
a particular-disease, while threat evaluation (say the deci- 


* Received by the PGIT, August 17, 1960. The work reported 
in this article was undertaken as part of the D.Sc. dissertation work 
at Mass. Inst. Tech., Cambridge, Mass., and was continued under 
Contract AF 30(602) )-2112 at Melpar, Inc. , Boston, Mass. 

+ Litton Systems, Inc., Advanced Dev. Lab., Waltham, 


Mass. ; 
formerly with Melpar, Inc., Boston. 
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sion of “attack” or “‘no attack’’) consists of the recognitio 
of the present situation as a member of the class 
situations which constitute threat. 

The purpose of this paper is to present a heuristi 
approach to the recognition of membership in class 
known only from a given set of their examples. Firs 
the fundamental ideas and underlying assumptions 0 
which the theoretical approach is based are discusse¢ 
A mathematical embodiment of these ideas is then de 
veloped. For a special selection of classes, agreement wit 
a decision-theoretical approach is demonstrated. In furthe 
support of this approach, results obtained in its successft 
experimental verification are described. 


FUNDAMENTAL IDEAS AND ASSUMPTIONS 


The desired objective is to find automatic methods fe 
learning how to measure membership in classes that ar 
known only through a set of their examples. It will Db 
assumed in the following that representative examples 
the classes to be recognized are given and from thes 
examples class definitions are to be constructed whie 
can be used to classify correctly each new occurrence a 
a member of the class to which it actually belongs. W 
will consider a general event—a member of any of th 
classes—represented by a point or vector in an N-dimer 
sional observation space which serves as a model of th 
physical world. Each dimension expresses a property ¢ 
the event, a type of observation that can be made about 11 
The entire signal which represents all the informatio 
available about the event is a vector v = (v1, vo, --- , Uy. 
the coordinates of which have numerical values whic 
correspond to the amount of each property which th 
event contains. In this representation, a set of event 
that belong to the same class correspond to an ensemb 
of points scattered within some region of the observatio 
space. A set of sample points from a given class mig 
be expected to “‘cluster”’ in the N-dimensional space i 
the sense that distances between members of the sam 
class are smaller, on the average, than those betwee 
points that belong to different classes. Unfortunatel: 
this state of affairs cannot generally be expected to exis 
Therefore the concept which plays a central role in tk 
theory which will be described is the notion that poin 
in the observation space which represent a set of no! 
identical events belonging to a common class must | 
close to one another as measured by some as yet unknow 
method of measuring distance, since the points represe! 
events which are close to one another in the sense thé 
they are members of the same class. Mathematical! 
speaking, the fundamental notion underlying the theor 
is that similarity (closeness in the sense of belonging 1 


same class or category) is expressible by a metric 
method of measuring distance), by which points 
resenting examples of the class we wish to recognize 
found to lie close to each other. 
o give credence to this conjecture, consider what is 
nt by the abstract concept of a class. According to 
of the possible definitions, a class is a collection of 
ags which have some common properties. By a modifi- 
ion of this thought, a class could be characterized by 
‘common properties of its members. A metric by which 
nts representing examples of a class are close to each 
er must therefore operate chiefly on the common 
perties of the examples and must ignore, to a large 
ent, those properties not present in each example. 
isequently, if a metric were found which called examples 
he class ‘‘close,’’ somehow it must exhibit their common 
perties. 
“o present this fundamental idea in a slightly different 
y, we can state that a transformation on the observa- 
1 space which is capable of clustering the points repre- 
ting the examples of the class must operate primarily 
the common properties of the examples. A simple 
stration of this idea is shown in Fig. 1, where the 


2! 


1 1! 


Fig. 1—Clustering by transformation. 


emble of points is spread out in the observation space 
ly a two-dimensional space is shown for ease of illu- 
ution), but a transformation T of the space is capable 
clustering the points of the ensemble. In the above 
mple neither the signal’s property represented by 
rdinate 1 nor that represented by coordinate 2 is 
ficient to describe the class, for the spread in each is 
re over the ensemble of points. Some function of the 
» coordinates, on the other hand, would exhibit the 
amon property that the dispersion of the points about 
xed straight line is small. In this specific instance, of 
rse, correlation between the two coordinates would 
ibit this property; but in more general situations 
ple correlation will not suffice. 

f the observation space were flexible (like a rubber 
et), the transformation 7 would express the manner 
which various portions of the space must be stretched 
compressed in order to bring the points together most 
sely. 

\Ithough thinking of transformations of the space is 
as general as thinking about exotic ways of measuring 
stance’’ in the original space, the former is a rigorously 
rect and easily visualized analogy for many important 
ses of metrics. 
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As any mathematical theory, the one which evolved 
from the preceding ideas is based on certain assumptions. 
The most basic assumption is that the N-dimensional 
space in which events exemplifying their respective classes 
are represented is a complete enough model of the physical 
world to contain information about the common properties 
which serve to characterize the classes. The significance 
of this assumption is appreciated if we consider, for 
example, that the observation or signal space contains 
all the information that a black and white television 
picture could present of the physical objects making up 
the set of events which constitute the examples of a class. 
No matter how ingenious the data processing schemes 
that we might evolve may be, objects belonging to the 
class ‘red things’ could not be identified, because re- 
presentation of the examples by black and white tele- 


-vision simply .does not contain, color information. For 


any practical situation one must rely on engineering 
judgment and intuition to determine whether the model 
of the real world (the observation space) is complete 
enough. Fortunately, in most cases, this determination 
may be made with considerable confidence. 

A second assumption states the class of transformations 
or the class of metrics within which we look for the 
“best.”” This assumption is equivalent to specifying the 
allowable methods of stretching or compressing the 
observation space within which we look for the best 
specific method of deforming the space. In effect, an 
assumption of this type specifies the type of network 
(such as active linear networks) to which the solution 
is restricted. 

The third major assumption is hidden in the statement 
that we are able to specify when the solution is best. 
In practice, of course, we can frequently say what is 
considered a good solution even if we do not know which 
is the best. The criterion by which the quality of a metric 
or transformation is judged good is thus one of the basic 
assumptions. 

The sufficiency of the examples as a representative set 
is also an assumption which needs to be considered. It 
is perhaps the most important assumption for, in practice, 
concurrence regarding the sufficiency of a set of examples 
is most difficult to obtain. Within the constraints of these 
assumptions the mathematical embodiment of the funda- 
mental ideas will now be outlined. 


MINIMIZATION OF MBAN-SQUARE DISTANCE 


The task of learning how to measure membership in 
classes consists of partitioning the observation space into 
regions in a manner which depends optimally on the 
distribution of the known sample points in the space. 
In the above process, sets of examples of each of several 
classes are assumed given, and each sample point is 
labeled with the name of the class to which it is a priort 
known to belong. Analogous to the application of likeli- 
hood ratios in decision theory, a convenient way to 
partition the observation space into regions, one cor- 
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responding to each class, is to generate for each class a 
quantitative measure of how 
“close’’ an arbitrary point of the space is to members 
of a specific class. In a sense, each function measures the 
similarity of an arbitrary point to a class, and partitioning 
the space is accomplished by assigning the point to that 
class to which it is most similar. We will arbitrarily choose 
the mean-square distance between a point « and known 
members of the class to serve as a quantitative measure 
of similarity S between x and the class. This definition 
is expressed by (1), where f,, is the mth known member of 
class F, d( ) is an as yet unspecified metric which expresses 
the sense in which members of F are closest to one another, 
and M is the number of given members of F; 


function which gives a 


M 
NSN =e He (l) 
NG kee 
The criterion for selecting the metric is that, if distance 
is measured by the optimum one from a class of metrics, 
then the mean-square distance between members of F 
(a measure of clustering) should be minimum. Note that 
the unknown in the minimization is the metric which is 
the error criterion in measuring similarity. 

In the following derivation, we will carry out the 
minimization for a simple class of metrics obtained by 
considering the Euclidean distance measured on a linear 
transformation of the space, where the transformation is 
subject to a suitable constraint devised to assure a non- 
trivial solution. The Euclidean distance, after linear 
transformation of the space, is expressed by (2a), and a 
constraint which assures a nontrivial solution is given in 
(2b). The constraint prevents minimization by the general 
collapse of the space by fixing the product of the squared 
lengths of the sides of all N-dimensional parallelepipeds. * 
By Hadamard’s theorem, this constraint also holds volume 
invariant under orthogonal transformations. 


thod=Vo[Sada-af; es 
= I (Seah) = JL. (2b) 


We now wish to minimize the quantity Q with which we 
denote the mean-square distance between members of F: 


lh 


Before proceeding with the minimization, the quadratic 
form @ may be simplified. Expanding the squared expres- 
sion as a double sum and interchanging the order of 
summations result in (4), where the bracketed quantity 
is a constant that depends only on the known samples 


Minimize Q 


~ M( 5 (3) 


~> > S b> Osi es 


m=1 p=1 i=l 7=1 


1G, Birkhoff and 8. MacLane, ‘‘A Survey of Modern Algebra,”’ 
The Macmillan Co., New York, N. Y.; 1953. 


of class F; 
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The above equation may be simplified by recognizin, 
that the constant (the bracketed expression) is an elemer 
of the sample covariance matrix U. 


qe 
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Hence the quantity Q may be written as in (6). Q maj 


where the vector a; = (@i1, Qs, °*° ; ey ene the pri 


denotes the transpose vector: 
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The constraint (2b) can also be expressed in matrt 
notation, and it is given in (8), where J is the identity 
matrix; 


Minimization of Q, subject to the constraint given in (8) 
may now be carried out by the method of Lagrange multi 
pliers.” The quantity Q and the constraint are differentiatet 
with respect to a, and their linear combination is equate 
to zero in (9), where the constant 21/7 /(M — 1) is lumpe 
into the Lagrange multiplier X. 


ya, [U — M][] aad] da; = 0. (9 

LAt 
Since da; is arbitrary, every term in the above sum mus 
be independently zero. Since the expression in parenthese 
is a constant that depends on 7, the simplifications indi 
cated in (10b) and (10c) can be made. 


alU —xCl] ale) = 0; 2 ee 
lAa 
a(U = d.1) = 0)" Ge 12 os eee 
where 
Soria) * (10¢ 
LAr 7 


from (8). For every eigenvalue \; for which (11) has | 
solution, there is a corresponding eigenvector a; whic 


2 F. B. Hildebrand, ‘“Methods of Applied Mathematics,’’ Prer 
tice-Hall, Inc., Englewood Cliffs, N. J.; 1952. 
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column of the desired linear transformation A of 
bservation space. 


|U —r, I | = 0. (11) 


nee the covariance matrix is positive definite, its 
values are real, and the corresponding eigenvectors 
rthogonal.” However, we have still not completely 
ed for the transformation coefficients, for the magni- 
3 of the eigenvectors V 6; are not known yet. 
» order to determine the 6,’s, we multiply (10b) by 
and sum over all z to obtain (12). This quantity is 
mean-square distance Q which we wish to minimize, 
ect to the constraint given in (8). The quantity to 
minimized is L, given in (13). Through use of the 
hod of Lagrange multipliers we obtain (14), where 
1 arbitrary constant. 


N N N 
O= > 4,Ua! = De Nia! =o Noe (12) 
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N N 
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nposing the constraint of (15), we can solve for the 
stant y, given in (16). 


ise (15) 
1 I] Ny 
N 1/N 
0 ee (II r.) : (16) 


stituting y into (14), we can solve for 6; as follows: 
1 N 1/N 
0; a (1 r.) ° 

eaters 


HYPERBOLIC CYLINDER 8.=1 
: i 
i=l 


(17) 
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Fig. 2—Geometric interpretation of minimization. 


he simple geometrical argument illustrated in Fig. 2 
fies that the solution thus obtained indeed minimizes 
.s a function of the 6,’s, Q is a plane [see (12)] whose 
rsection with the hyperbolic cylinder (the constraint) 
only one point of zero derivative. This point is a 
imum. 


Thus the optimum of the class of metrics we considered, 
which minimizes the mean-square distance between mem- 
bers of the class /’, is obtained by measuring the Euclidean 
distance on a linear transformation of the observation 
space. The transformation consists of a rotation and a 
diagonal transformation. Columns of the rotation trans- 
formation matrix R are unit eigenvectors of the sample 
covariance matrix U, and elements of the diagonal trans- 
formation D are inversely proportional to the square 
roots of the corresponding eigenvalues. The function S, 
defined by (1), is now given by (18) and denoted by f(x), 
to simplify the notation. The bar denotes averaging over 
the given sample points f,,. 


Saif), DID I ae) ie: 


(18) 


Similar functions may be developed for each of the classes. 
Suppose that there are only two classes F and G, and 
that the mean-square distances between « and members 
of the respective classes, as measured by two different 
metrics, are denoted by f(a) and g(a). The decision whether 
x is a member of F' or G is made by comparison of f(x) — 
g(x) with a threshold T, as expressed in (19). The locus 
of points for which f(x) — g(x) = T, a constant, serves 
as the boundary between the two regions into which the 
observation space is partitioned. 


f(a) — ga) 2T. (19) 


RELATIONSHIP TO Decision THEORY 


The relationship between decisions based on likelihood 
ratios and those made by the method described thus 
far are discussed in this section. It will be shown that. 
if the classes are Gaussian processes with unknown but, 
in general, different means and variances, then S defines 
contours of equal a posterior? probabilities. That is, 
S(a, {fn}) measures the mean-square distance by a non- 
Euclidean metric between a point « and M members of 
an ensemble of points {f,,},and is a measure of the proba- 
bility that x belongs to class F. 

Fixed values of S correspond to contours of equal 
a posteriori probability. The ratio of a posteriori proba- 
bilities is proportional to the likelihood ratio, the logarithm 
of which will be shown to be equal to f(a) — g(x) above. 

Consider the situation where an arbitrary event « may 
be a member of only one of two classes F or G. The like- 
lihood ratio that « belongs to F rather than to G is ex- 
pressed by the ratio of a posteriori probabilities in (20), 
which may be simplified by Bayes Rule: 


PF /x) _ pla/F pF) /p@) . pela) _ 
p(G/x) — p(a/G)p(G)/p(x) — pe() 


The likelihood ratio is thus proportional to the ratio 
of the two joint probability densities of the two Gaussian 
processes. In the event that membership in either of the 
two classes is a priori equally likely, the proportionality 
becomes an equality. 


l@). (20) 
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Now let us examine the probability density pr(x), 
a factor of the likelihood ratio (x). For the multivariate 
Gaussian process, the joint probability density is given 
by (21), where U is the covariance matrix of F, and 
[U,,] is the cofactor of the element with like subscripts 
in the covariance matrix.” It should be noted that |U,,|/|U| 
is an element of U~'. 


| U ithe: 


(2m) N/2 


 — m,)(a, — m) | AES 


7 |1/2 
U | 


- exp [= >» dS | Un | @, -— m)@, — m) | (21b) 


Contours of constant joint probability density occur 
for those values of x for which the argument of the ex- 
ponential is constant. The exponent expressed in matrix 
notation is 


exponent = [—3(a — m,)U ‘(x — m,)’]. (22) 


It will be recalled that one of the operations on the 
set of points {f,,} which the optimum metric performed 
was a rotation, expressible by an orthogonal matrix R. 
This is a pure rotation (an orthonormal transformation), 
where columns of & are unit eigenvectors of the covariance 
matrix U. 

Let y be a new variable obtained from x by (28). 
Substituting (23b) into (22), (24) is obtained as follows: 


y = xR, (23a) 
tole (23b) 
exponent = [—3(y — m, RU" [R']'(y — m,)’]. - (24) 


Since FR is orthogonal, the special property of orthogonal 
matrices that R-’ = R’ may be utilized to simplify (24). 
This yields 


exponent = [—43(y — m,)R’/U'R(y — m,)’]. (25) 


Furthermore, since columns of & are eigenvectors of the co- 
variance matrix U, the matrix R must satisfy (26a), where 


A is the diagonal matrix of eigenvalues of | U — X,/ | = 0. 
RIG = Ajke=— 0; IOs AION Bee Oh (26a) 

y 0 | 
ep (26b) 


F 


By taking the inverse of both sides of (26a) and employ- 


Av 


’W. B. Davenport and W. L. Root, “Random Signals and 
Noise,” McGraw-Hill Book Co., Inc., New York, N. Y.; 1958. 
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ing again the special property of orthogonal matrices, (2 
may be obtained. This latter expression, when substitute 
into (25), produces the result stated in (28). 


R'U'R = A”; (27 


exponent = [—4(y — m,)A°‘(y — m,)’]. (2 


The quadratic form of (28) expresses the fact that contour 
of constant probability density are ellipsoids with center 
at m,; the direction of the principal axes are along eige 
vectors of the covariance matrix, and the diameters ar 
equal to the corresponding eigenvalues. This result cal 
be shown in a more familiar form by converting the 
quadratic form of (28) to a sum, as shown in (29), u 
which y,, is the coordinate of y in the direction of the ntl 
eigenvector, and m,, is the mean of the ensemble in thy 
same direction: 

al 


— m, | 
dn 
. 
An expression of identical appearance can be derivee 
from the exponent of the joint probability density of clas: 
G. The differences between the two exponents exist in L 
the directions of their eigenvectors, 2) the numerica 
magnitude of their eigenvalues, and their ensemble eae 
Denoting the exponents in the two probability densiti 
by e;(v) and e,(x), the logarithm of the likelihood ratic 
may be written as in (30), where K is a constant whicl 
involves the ratio of a priori probabilities and the rati 
of determinants of the two covariance matrices: 


log l(a) = K + e,(x) — e, (2). 


exponent = i 


(30 

Now we will show that e,(a) is proportional to g(a) 
and e,(«) is proportional to the previously derived f(x) 
and thus prove that, for the special case when classe 
are Gaussian processes, partitioning the observation spac 
into regions by (19) is identical to that achieved throug] 
decision theory. In accordance with the foregoing remarks 
we wish to prove 
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Recognizing (from the definition of D) that DI D’ = A~ 
and making the change of variables, 7 sk, (382)— 
obtained, where /’,, is the transformed vector f,,. 


{a)=y = Eke G = Fey 
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x e€;(x). 


Writing this as a sum and interchanging the order ¢ 
averaging and summation yields 
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By expanding the square, and adding and subtracting (F,, 
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n each term of the numerator, we obtain (34), where 

m, and F; — (F,)? = 0, = 2,. Thus the proportion- 
y of f(x) and e,(x) is established. It is now readily seen 
t f(x) — g(x) = e,(x) — e,(x), and contours of constant 
lihood ratio U(x) are identical to contours of constant 
) — g(x). 
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EXPERIMENTAL RESULTS 


An experimental program was devised to verify the 
hnique discussed above. This program, consisting of 
chine recognition of spoken numerals, was of suffi- 
ntly high dimensionality to exhibit not only the success 
the method, but also the ease with which it can be 
plemented on present-day computers. The numerals 
sro”’ through ‘‘nine’”’ were spoken a number of times 
ten male speakers drawn from the Northeast section 
the United States. Each utterance of a numeral formed 
» basis of a sample point in a high-dimensional space. 
.e utterances were represented as vectors by means 
an eighteen-channel Vocoder.* The Vocoder is a set 
eighteen stagger-tuned band-pass filters that produce 
the filter outputs the “instantaneous”’ frequency spec- 
im of the utterance, as a function of time. 
An example of the output of the Vocoder is given in 
x. 3, in which the ordinate is frequency and the abscissa 
time. The intensity of the spectrum at a given time and 
quency is indicated by the grayness of the sonograph 
the prescribed time and frequency point. In order to 
m a vector from the sonograph, the spectrum is sampled 
20-msec intervals in each of the frequency channels. 
ie array of sample heights is a vector, an example of 
ich is given beneath the sonograph of Fig. 3. 
Because the duration of utterances of numerals varies 
th the numeral and with the speaker, all records were 
rmalized in time by multiplication by scale factors, 
d the sonographs were resampled to produce twenty 
1e intervals per utterance. The 18 * 20 = 360-dimen- 
nal vector was then augmented by the scale factor as 
additional dimension. This relatively unsophisticated 
proach to normalization was adopted because of the 
ited scope of the experiment. 
Recognition of membership in classes involves learning 
a small number of samples, at first, and then increasing 
» sample size while testing unlabeled points at each 
ge of the experiment to see whether they are correctly 
ssified. Learning to recognize spoken numerals, in this 
yeriment, consisted of forming the covariance matrices 
2ach of the ten sets of given sample words, and of solving 
the optimum transformations which maximally clus- 
ed each set. A new word was then classified as a member 


‘The Vocoder data used were made available through the 
rtesy of the AF Cambridge Res. Ctr. 
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Fig. 3—Vocoder representation of the spoken word “‘three.”’ 


of one of the spoken numerals by computing the ten 
different mean-square distances—the f(x)’s of (81)—and 
then deciding that the new word belonged to that class 
of numerals to which the corresponding f(#) was smallest. 

A typical result which demonstrates improvement in 
the machine’s performance as the number of known, 
labeled examples of spoken digits is increased, is illu- 
strated in Fig. 4. This figure contains four confusion 
matrices constructed for the cases where numeral recogni- 
tion was learned from 3, 4, 7, and 9 examples of each of 
the ten classes of digits. The ordinate of a cell in the 
matrix signifies the digit which is spoken, the abscissa 
denotes the decision of the machine, and the number in 
the cell indicates the number of instances in which the 
stated decision was made. The number 1 in row 6 and 
column 8 of Fig. 4(c), for example, denotes the fact that 
in one instance a spoken digit 6 was recognized as an 8. 
Note that the error rate decreases as the number of known 
examples of classes is increased. For the 9 examples per 
class, no errors were made. This result is particularly 
interesting in view of the fact that the digits which were 
tested were spoken by persons not included among those 
whose words were used as examples. 
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Fig. 4—Confusion matrices illustrating the process of learning 
spoken numeral recognition. (a) 3 examples per class, error rate 
45 per cent. (b) 4 examples per class, error rate 30 per cent. 
(c) 7 examples per class, error rate 10 per cent. (d) 9 examples 
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CoNCLUDING REMARKS 


A geometrical approach to recognition of membershy 
in classes is presented. The approach consists of postulating 
a new decision rule which involves the comparison 6 
mean-square distances between the input vector am 
members of the different classes. Distance to each class 
in this comparison, is measured by different metrie 
generated by minimization problems which cluster sampl 
points of each class separately. The metrics thus obtaine 
are shown to equate the new decision rule with Baye 
Rule when classes are assumed Gaussian processes, an¢ 
when the sample covariance matrix approximates thi 
covariance matrix of the process. This is, in a sense, wha 
is implied by the stated assumption that the example 
of the classes are representative. 

Several extensions of the method described were ob 
tained by consideration of larger classes of metrics in 
volving highly nonlinear operations. In addition, othe 
methods of generating metrics to be used in the decisior 
rule were considered. In particular, a number of solution: 
were obtained for generating metrics which minimize 
distances between members of the same class, whil 
separating members of different classes. A discussion 0 
metrics of this type and their decision-theoretical impli 


per class, error rate 0 per cent. 


CORRECTION 


Thomas Kailath, author of ‘Correlation Detection of 
Signals Perturbed by a Random Channel,” which appeared 
on pages 361-366 of the June, 1960 issue of these TRANs- 
ACTIONS, has brought the following changes to the attention 
of the Editor. 

On page 365, column 1, the first unnumbered equation 
after (27) should read 


(k) 
2. Gy N= 


(k) ) (Ck) ~  a(k) ack) 
E Ly TOL oO Way , ee 


X; _ x; Uy 


The second unnumbered equation after (27) should read 


cations will be the subject of another paper. 


rrespondence 


_Close-Packed Double Error- 
‘recting Codes on P Symbols* 


et p be a positive integer. Let E,? be 
‘space of all d-tuples whose entries are 
pn from the set 0, 1, 2, 5 (ea 
define the distance between two points 
4,4 as the number of places in which 
y disagree. A close-packed double error- 
ecting code is a subset S of H,? such 
L any two points of S are no closer than 
each other and such that any point of 
is within 2 of some (hence, a unique) 
nt of S. 
or p = 2, Shapiro and Slotnik! have 
wn such a code exists only for the gn 
: 5. For p = 3, such a code ford = 
found by Golay,2 and Lee* has mon 
to be the only possible d. Lee® has also 
wn there are no such codes for p = 4. 
Ve shall treat the case p = 5 with some 
ner general methods that we believe 
_extend to other p. Now if such a code 
sts, it is easy to see that the space H;4 
aks up into spheres of radius two about 
points of S. That is, all such spheres 
disjoint and together they fill #;?. 
> number of points in such a sphere is 
- Ad + 42d(d — 1)/2 and the number of 
4 spheres is 


5? 
2 Te (1) 
1+4q4 204 1) 
w, (1) is an integer, so 
2 
p4a4 2G) _ a <a. 
ve let 
Z=4d-1 (2) 
n the above equation becomes 
Tp A RU Se (3) 


Now, (3) is clearly satisfied for k = 0 
nd 2. But tracing through (2) and (3) 
see that these correspond to d = 0, 1, 
| 2, respectively, which are impossible 
double error-correcting codes. We wish 
nvestigate k > 2, and our result is that 
re are no further solutions. In (3), Z is 
iously odd, Z = 2n + 1, say. Eq. (3) 
omes 


ameter Wet il) (4) 
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We shall, from this point on, restrict k 
and n to solutions of (4). We show first 
Lemma 1. 

Lemma 1: 


n= 28 or -—29 (mod125). (5) 


Proof: If n runs through the integers 
0, 1, 2, 3, and 4, then n? + (m + 1)? runs 
through 1, 0, 3, 0, 1 (mod 5). Thus we see 
nm = 1 or 3 (mod 5). If n = 1 (mod 5), 
n = 5) + 1. Substituting this in (4), we 
obtain 


5 = 107? + 67 + 1, 


or 7 = —1 (mod 5),7 = 51 — 1, and hence 
n = 251 — 4. Substituting this in (4), we 
obtain 5*-2 = 50/2 — 141 + 1 orl = — 
(mod 5), 1 = 5¢t — 1. Son = 125 — 29, 
1.e.,n = —29 (mod 125). 

The corresponding computation shows 
that if we start with n = 1 (mod 5), we 
obtain n = 28 (mod 125). 

These results constitute a sieve on n; we 
really want a sieve on k. But we shall see 
that Lemma 1 is useful in obtaining con- 
ditions on k. 

We now pass to the Gaussian integers 
about which we need the following facts 
which we shall quote without proof. (See, 
for example, Landau* or Zariski and 
Samuel.®) The integers are the numbers of 
the form a + bi where a and b are natural 
integers. Unique factorization holds. 2 + 2 
and 2 — 7 are primes. The only units are 
+1, —1, +7, and —7. Let us factor (4): 


[m+ 1) + miJ[m + 1) — mi] 


=24+7)'2-7', © 


IRE TRANSACTIONS ON INFORMATION THEORY 51 


both be divisible by 5, which is impossible. 
This shows there are eight possibilities: 


(n oe 1) +m = ee —p, in’, 


ah = =. Bes Onn 
Uy yb i —_ hb ¥ oe 5; or —t iH (7) 
where » = 2 + 7. Now, let 
7 =F 
is ea 
B; oo 9 2 (8) 


Summing the geometrical series shows the 
generating function of the 6; to satisfy 


(1 — 4Z + 52”) Sy B;Z' 


=f DZ RRS) 
This yields a recursion formula for 6;, 

Bo = ig B, = +2, 

6; = 4G;71 = OB jae) 7 eee (10) 


On the other hand, if n and k constitute a 
solution of (11) we can compute 8, from 
(7) and (8). This yields the four possibilities, 


6, = =n, E(t Ly (11) 


From (11) and Lemma 1, we see that a 
solution of (4) entails 


B, = +28 or +29 (mod 125). (12) 


Now we must compute 6; from (10), 
watching for the occurrence of (12). It 
clearly suffices to compute (10) (mod 125). 
If we compute the 6; for any integral 
modulus, the 8; must be cyclic since the 
residue class is finite and the repetition 
of any consecutive pair of 6;’s means the 
recursion repeats. Table I represents the 
computation of the 6; (mod 125). 


TABLE I 
8B; (mod 125) 


Ge | 1|/2]3 4 | 5 6 7 | | 9) 10 | 11 | 12 | 13 14 15 16 

By | 11213 )2 |) =7 | =—38 18 | —28 | —27 | 32 | 13 | 17 | 3 | 52 | —57 12 —42 
Gaels 19 20 21 22 23 24 | 25 | 26 27 28 | 29 | 30 | 31 | 32 
B; |22 | 48 | —43 ) —37 | =58 | —47 | —23 ) 18 | 62) 33) —53 |) —2 | 7 | 38 | —8 | 28 


with the possible inclusion of some units. 
We now wish to find the factors of 
(n + 1) + ni. Should (n + 1) + ni have 
as factors both 2 + 7 and 2 — 7, it would 
have a factor of 5 and hence its real and 
imaginary parts, (n + 1) and n, would 


4E. Landau, ‘‘Elementary Number Theory,” 
Chelsea Publishing Co., New York, N. Y. (reprint); 
1958. 

50. Zariski and P. Samuel, ‘‘Commutative 
Algebra,’ D. Van Nostrand Co., Inc., New York, 
NER Yeu 819) 


Applying the criteria oe a2) to Table I, 
we see that 


k= Gned-2): (13) 

We shall now give a demonstration due 
to H. F. Mattson that excludes the possi- 
bility of (13) and hence the existence of 
close-packed double error-correcting codes 
on five symbols. The basic idea is that if 
2-5" — 1 is a square, then it is a quadratic 
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residue for any modulus, an immediate 
consequence of the definition of quadratic 
residue. Let g be a prime + 5 and ?¢ be 
the order of 5 (mod q), 7.e., some power of 
5 is congruent to 1, e.g., 5%! by Fermat’s 
theorem, and we let ¢ be the least positive 
integer such that 54 = 1 (mod q). Then 


57" —~ 1 =5"' —1 (mod q 


j= 0,1,2,-° 


Thus the numbers of the form 5/‘t” — 1, 
7 SQ, Ny & - are either all quadratic 
residues or all quadratic nonresidues. To 
show the impossibility of (13), it suffices, 
therefore, to find a prime q such that 
57 — 1 is a nonresidue and 5 is of order 
25 (mod q). We shall show q = 101 suffices. 
Some powers of 5 (mod 101) are 


5° = —6, 5’ = —49, and 5° = 1. 


This shows that 5 is of order 25 and that 

2-57 — 1 = 2 (mod 125). But 2 is a non- 

residue of primes of the form 8n + 5 (see 

Zarinski and Samuel®) and, in particular, of 

101. This completes our proof that there 

exist no close-packed double error-correcting 
codes on 5 symbols. 

CarL ENGELMAN 

The Mitre Corp. 

Bedford, Mass. 


Matched Filters for Multiple 
Processes* 


By a multiple process is meant a q- 
dimensional (real-valued) vector stochastic 
process. As is well-known, a matched filter 
is defined for one-dimensional processes as 
one that minimizes the noise-to-signal 
ratio.! Now, in the multiple process case 
we can define noise-to-signal ratio in several 
ways and it is the purpose of this note to 
examine the corresponding matched filters. 

A filter (linear filter) now corresponds to 
a m-by-q matrix function W(t). We have 
q inputs and m outputs related by 


Of) ; We eee 


where X(c) is the g-dimensional input 
process and Y(t) the m-dimensional output. 
For physical realizability it is necessary 
that 


W(t) = 0 [zero matrix] for t < 0, 


and we shall assume this in what follows. 
Suppose now X(t) is composed of signal 
and noise 


X(t) = S() + NG) 


* Received by the PGIT, July 22, 1960. 

1G. L. Turin, “An introduction to matched 
filters,’ IRE Trans. on INFORMATION THEORY, vol. 
IT-7, pp. 311-329; June, 1960. 


where S(t) is the g-dimensional determinate 
signal, and N(t) is the q-dimensional noise 
(stochastic) process. It is convenient to 
think of S(t) and N(t) as g-by-1 matrices. 
The covariance matrix function R(s, ¢) of 
the noise process is defined by 


R(s, t) = B[N(s) NO"), 


the asterisk denoting the adjoint. The 
filter W(t) acting on X(¢) being linear, we 
have both noise and signal ‘‘terms’’ in Y(¢). 
The response at an instant of time T is 
given by 


Y(é) = i “WT — 1) S(b dt 


ee i! “WT — t) N(b dt. 


The noise-to-signal ratio N/S in Y(t) can 
be defined in various ways. First, if we 
consider them as vectors, the squared 
magnitude of a vector being the sum of 
the squares of the components, we can 
define 


average squared magnitude of noise 


Januar 


For any choice h,, let 


iia [hi, S13 


Then 


m 


SRA he 
(N/S): = =p eee 


Keeping {y;} fixed, suppose we vary {hy 
in order to minimize the ratio (N/S) 
Then it follows, as shown in a previou 
article,? that the minimum is given by 


1 


[h, S] 


where h is the solution of the (integral 
equation 


Rh = 8. 


(N/S)1 — 


Again, we can also define it as 


average squared magnitude of noise 


squared magnitude of signal at time 7 


(N/S)2 = 


Also as 


(N/S)s 


We shall show that all of these notions 
lead to the same matched filter. For this, 
it is convenient to introduce the notion of 
a linear operator R on the (Hilbert) space 
5C of dimensional square-integrable 
functions defined on [o, T]. Thus, for any 
element g in JC, we define 


Rg =h 


where 


Hebe i: TACO: 


The inner product in 3C of any two ele- 
ments h and g with components h;, g; will 
be denoted 


qa fh 
Ig, h] = Ea hi(t) g(t) dt. 
t=1 0 
Let 


DO Wie ACE sy), (WES ess I 


be the rows of W(T’ — t). Then each h; 
belongs to 3C and the average of the sum 
of the squares of the noise components of 
Y(t) is easily seen to be 


™ 


Bale ll: 


a=1 


square of the sum of the signal components at time Te 


sum of noise-to-signal ratios of each component. 


The optimal {h,;} corresponding to this 
minimum is given by (within multiplicativ 
scalar constants) 


heh, i=1,--m @ 


In a similar manner, it can be shown that 
the same solution minimizes the other 
(N/S) ratios. The minima are given by 


' 


1 


; 1 
min (N/S). ~ mh Sl’ 


1 


min (N/S)s3 = CibeSin 


The details of the solution when there is no 
element h in $C such that 


Rh=S8 


may be found in a previous paper.” 
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ential Generation and Decoding 
e P-Nary Hamming Code* 


veral recent papers! have shown how 
sequential circuits operating over 
dular field may be used to generate 
decode single- and multiple-error 
cting codes. This note is concerned 
the possibility of generating and 
ling Hamming codes! with the same 
of circuit, which has the feature of 
ring only as many delays as there 
hecking digits, and also leads to a 
e decoder. Other papers®** have been 
rned with sequential generation of 
-correcting codes, but have resulted 
reuits with a larger number of delay 
nts than necessary, and have con- 
d only the binary case. 
e Hamming codes have the property 
the information digits pass through 
oder unchanged, and that the checking 
s are linear combinations of the 
mation digits. The conclusions of this 
stigation are: 1) The Hamming codes 


be generated by linear modular 
ential circuits whose order (7.e., the 
ber of delay elements) equals the 
ber of check digits, but this circuit 
t be time-varying. 2) The decoder, 
+h is the inverse of the coder, has a 
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Yr hy, 0) 
Y2 lp ine 
LYm+k [ickegee 


Correspondence 


structure which is simply related to the 
coder. In the binary case, in fact, the coder 
is self-inverse, 7.e., the decoder and coder 
are identical. 3) A simple error-correction 
scheme may be used. 

For the coder to be realizable as a 
sequential circuit, the check digits occur 
after the information digits which they 
check. For simplicity we have chosen to 
have the coded word consist of the informa- 
tion digits in the first m positions and 
the check digits in the last k positions. 
Hence the input to the coder will be the 
sequence %1%2 +++ 2m 00 +-: O, where the 
number of zeros equals the number of 
check digits to be generated in the coder. 
The output of the coder will be 


YiY2 °° * YrmYm+iYmt2 °° Ymtky 
where y; = a; fori < m, and y; equals 
the Hamming check digit for 7 > m. 

Now, in a linear sequential circuit, the 
input-output relation may be represented 
in the following matrix form: 


0 Ds 
0 0 1B 
Gn 
0 
Dercnerstal 0) 
where 
hi; = hi-; (1) 


if and only if the circuit is time-invariant. 
Note that the transmission matrix! [h;;] 
is lower triangular, a necessary condition 
for it to represent a realizable network. 
Since the coding scheme requires that 
Ya = Ui, Y2 = U2, °° * 5 Ym = Im, 1b1S apparent 
that 
ge =" 03; NSS 710" 

Also, since 2; = O for 7 = m + 1, 
m+ 2,-::,m+hk, wecan set h(i, 7) = i; 
for7,j =m+1,m-+2,--:,m-+k, where 
¢i; is arbitrary. However, physical reali- 
zability requires that ¢;; = 0 forj7 > 7 
and the existence of an inverse for H 
necessitates that ¢:; # 0. Hence we find 
that the transmission matrix H of the 
coder may be partitioned in the following 
manner: 


TaeO 
C ®,. 


H = 


10 B, Friedland, ‘‘A technique for the analysis 
of time-varying sampled data systems,’ Trans. 
AIEE, vol. 73, pt. 2, pp. 407-414; January, 1957, 


53 


where I, is an m X m unit matrix, @; is 
an arbitrary & X k nonsingular triangular 
matrix, and Ois the null matrix. The matrix 
C is the check matrix whose elements are 
determined by the nature of the checking 
scheme to be employed. 

To demonstrate that H cannot be the 
transmission matrix of a time-invariant 
circuit we consider first an example for 
p = 2,m = 4, k = 3; the corresponding 
transmission matrix is 


a | 


lL @ @ @ 3 
| 
On 1 Oig ea 
Wee OL 
| 
Wet) Cr es 


hso hs hse hss 1 & 
Lheo her heo hes 5 ra 


Since fio = 0, then haz = hoa = hes = O 
to satisfy (1). Likewise hz = 0 and ha = 0 
forces the condition hy = ha = hs = 
hss = hes = 0. Hence the 4th column of 
the check matrix C is zero, which implies 
that the check digits are independent of 
the information digit x(3), implying that 
this digit is not checked. In general, a 
fixed system would require n — k — 1 
zeros, where n = m + k, following the 
nonzero elements on the major diagonal. 
In order that the nth digit be checked, 
there must be at least one nonzero element 
in the last k rows of the mth column. These 
are the k elements following the nonzero 
element on the major diagonal, and thus 
n —k —1 < k. Using the relation that 
(p.— 1)n = p* — 1 for close-packed codes 
(including the Hamming code) it is found 
that 


=ptp +: +p, @ 


which cannot be satisfied for any p > 3 
or k > 2. Thus, a time-invariant coder 
leaves some digits unchecked, and hence, 
the Hamming coder cannot be _ time- 
invariant, except for two trivial cases of 
only one information digit each. 

The simplest coder and decoder result 
when ® = J;, ak X k unit matrix. Then we 
obtain the transmission matrix of the coder 


H=| 
Crete! 


Now the decoder has a_ transmission 
matrix Hz which is the inverse of the 
transmission matrix of the coder. Hence, 


owing to the selection of @ = J; we find 
that 
a Pa tas Oxo 
pe 3 
—C I, 
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It is seen that the decoder and coder are 
identical in structure, the only difference 
being that the matrix —C of the decoder 
is the additive inverse, modulo p, of the 
corresponding coder matrix. In the binary 
case the coder and decoder are identical, 
since C = —C, modulo 2. 

The check matrix C of the coder may be 
obtained using the Hamming check rules’, 
and is not unique. The only rule to be 
followed in constructing C is that c:; ¥ 0 
if and only if the 7th check digit checks 
the jth information digit. However, by 
adapting Huffman’s error-correcting techni- 
que!3 we can obtain a very simple decoding 
system. The principle of this technique 
is the choice of a unit response whose 
m + 1,m + 2, ---,m + k symbols uniquely 
identify the position in which the error 
has occurred. But, since in the Hamming 
code a single error does not propagate to 
the other information digits at the input 
to the decision mechanism, only one 
information digit must be corrected. Errors 
in the check digits alone need not be cor- 
rected. 

Each single error in an information 
digit will result in a distinct sequence in 
the check digits (at the input to the decision 
mechanism). This sequence will contain 
at least two nonzero elements (since errors 
in the check digits will produce sequences 
containing a single nonzero element). An 
error of magnitude +q (q < p) will produce 
a check sequence such that each digit is 
q times (mod p) the corresponding digit 
for an error of +1. An error correction 
sequence is required that contains con- 
secutively all combinations of the check 
digits for single errors of +1 in the informa- 
tion digits. This sequence differs from the 
maximal length sequence of Stern and 
Friedland? in that it does not include com- 
binations with only one nonzero element. It 
is possible to obtain such a sequence by con- 
sidering only the first (p* — 1)/(p — 1) — 1 
terms of a maximal length sequence, if the 
sequence is chosen such that the first 
term is unity and the next k — 2 terms are 
zero. This choice is always possible by 
proper selection of the numerator poly- 
nomial of the generating pulse-transfer 
function. 

Error correction is obtained by com- 
paring the check digits entering the decision 
mechanism with the error sequence. The 
first comparison is made between the first 
k digits of the error sequence and the k 
check digits. The error sequence is then 
shifted one place for the second comparison, 
such that the second through (k + 1)st 
digits of the error sequence are compared 


with the check digits. Subsequent com- 
parisons are obtained by further shifts. 
If a match is obtained or if the check 
digits are multiples (digit by digit, mod p) of 
the correponding k digits of the error se- 
quence on the rth comparison, then the digit 
in error is the (n — k + 1 — r)th. This digit 
is in error by +q, where q times the error se- 
quence matches the check digits. 

The C matrix must now be arranged in 
the order corresponding to the error 
sequence by an appropriate renumbering 
of the information digits. Thus, for example, 
the last column of the C matrix contains 
the first k digits of the error sequence, in 
order, and the first column contains the 
last k digits of the error sequence. 

As an example, consider the case dis- 
cussed above (p = 2, n = 7, k = 3). A 
possible sequence for this system would be 
1011100,10--- . The required sequence 
for the Hamming decoder is 1 0 1 1 1 0. 
Corresponding to this sequence, the infor- 
mation bits must be renumbered such that 
the second and fourth information bits of 
the standard Hamming word are inter- 
changed. The C matrix becomes 


er ts eaeia 
C= eae 
Oeey ee ae 


If the input to the decision mechanism 
were 1234 1 0 1, it can be seen by com- 
paring the check bits with the error sequence 
that x, is in error. If, on the other hand, 
the error sequence is shifted one place 
before a match occurs (7.e., the input to 
the decision mechanism is r:%2r3r; 0 1 1), 
a comparison looks like 


iba 49 Gosden, (OAL Al 
je OL IEIEO: 


In this case, x3 is in error. In general, all 
other information bits are correct. 

One realization of the coder is shown in 
Fig. 1. One check digit is generated in 
each indicated path. All summers are 
modulo p adders or subtractors as shown. 
The gains a;, being the proper multiplicative 
factors for the appropriate information 
digit, are time variant. The 6;’s can be 
realized simply by switches that are open 
except at the time of the 7th check bit 
output, = n — k — 1 +7. This realization 
assumes the use of p-parity, 7.e., 


De ai;x; + ys = 0 (mod p). 


‘ (3) 
| - =a? shtlieegih aia 
O [D , 


a a 
Olnks b, 
a oa 


Fig. 1 


Coder realization. 


The decoder is identical in form to th 
coder and differs only in that a; is replace 
by —a;. : 

For the C matrix of the example discusse 
above, the values of the multipliers a 
zero except 


a=1 at ¢=0,1,3 
a@=1 at ¢=0,2,3 
Qsc= 1 “at CS] 1e2es 
b=1 at ¢=4 
b5%=" 1, cats —5 
b= 1 at t= 6. 


The Hamming coder and decoder 2 
not self-clearing (which is also true of an 
Huffman-type system). The system can b 
cleared without any loss in time in eithe 
of two ways. The first method is to place 
a normally closed switch in each feedbae 
path. This switch would have the sam 
control as the b; (a normally open switch) 
The second method is to add a feedback 
path from the output of b; to the summe 
(at the input of the delay) with a negativ 
input as shown in Fig. 1 with a dashed line 
This would be nonzero only at the time o 
the output of that particular check digi 
and would cancel the feedback from th 
solid path. Neither method affects th 
transmission matrix H, since 6; is zero a 
all times after this clearing takes place. Ir 
normal operation, where the input i 
repetitive (7.e., a new input occurs every 
n time units), this allows operation without 
interference among words. 

The author wishes to acknowledge thi 
many helpful discussions with Profs. B 
Friedland and T. E. Stern. 


ALAN B. Marcoviti 
Columbia University 
New York, N. ¥ 
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A Statistic Associated with the Joint Distribution of N Successive 
Amplitudes—W. C. Hoffman (in English). (Dept. of Math., Univer- 
sity of California, Los Angeles, Calif., Ph.D. dissertation; 1953.) 


The joint distribution of n successive amplitudes refers to 7 values 
from the discrete-parameter stochastic process R; = {Xj;? + Y,?}} 
(J an integer), where {X,;, Y;} constitute a stationary Gaussian 
process. The R; process has applications in electronics, though its 
sampling properties have not previously been investigated to any 
great extent. 

The covariance matrix of the Gaussian process is defined, and 
certain of its properties explored. Two lemmas are proved concerning 
the inverse covariance matrix and the eigenvalues of the covariance 
matrix. A useful property of a parameter of the /-distribution is 
determined and the covariance function of the R; process found. 
The bivariate, trivariate, and joint n-dimensional ‘probability density 
functions of the R; process are then derived, using the lemma on the 
inverse covariance matrix and some results from the theory of 
Bessel functions. 

A statistic g, consisting of the mean value of the sum of n suc- 
cessive values of the square of the R; process, is defined next, and 
shown to be an unbiased and consistent estimator of a parameter 
of the R-distribution. The characteristic function of g is found by 
a modification of the diagonal elements of the inverse covariance 
matrix of the Gaussian process and an application of a basic property 
of probability density functions. After some algebraic reductions, the 
characteristic function can be put in a form amenable to the Fourier 
inversion formula. Evaluation of the latter by the calculus of resi- 
dues yields the small sample distribution of qg in terms of the eigen- 
values of the covariance matrix of the Gaussian process. 

It is then shown that the statistic g, under a hypothesis that 
amounts essentially to the existence of a spectral density function 
for the Gaussian process, 1s asymptotically normally distributed. 
In the case of simple Markov dependence, the hypothesis of the 
theorem is automatically satisfied. 

A test is prescribed for independence versus simple Markov de- 
pendence for the f; process, assuming the availability of independent 
realizations. This test is equivalent, in the simple Markov case at 
least, to the uniformly most powerful one-sided test of the variance 
in random sampling from a normal distribution. The power function 
of the test is determined and depicted graphically for several sample 
sizes at the 95 per cent significance level. 

The Markov case of the R; process is studied in some detail. 
Among other results, it is shown that the process is ergodic. Formulas 
are given for maximum likelihood estimates of the parameters of the 
transition density functions. Using Wald’s and Kazami’s results on 
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the asymptotic properties of maximum likelihood estimates in the 
case of dependent random variables, the asymptotic efficiency of the 
statistic q is determined in the case of simple Markov dependence. 


Summary of Maximum Theoretical Accuracy of Radar Measure 
ments—R. Manasse (in Mnglish). (Mitre Corp., Lexington, Mass., 
Tech. Series Rept. No. 2.) 


This paper summarizes some general formulas for maximu 
theoretical accuracy of radar measurements on a target in the 
presence of additive white Gaussian noise. The formulas are special- 
ized to some particular cases of interest. 


Parameter Estimation Theory and Some Applications of the Theory 
to Radar Measurements—R. Manasse (in English). (Mitre Corp., 
Lexington, Mass., Tech. Series Rept. No. 3; no date.) 


The general theory of parameter estimation is developed using 
the inverse probability approach. Where the measurements are 
perturbed by additive Gaussian noise, and when the received in- 
formation is sufficient to determine the parameters of interest 
rather accurately, it is shown that an optimum method of processing 
redundant data based on the maximum likelihood approach reduces 
approximately to the solution of k nonlinear equations in the k 
unknown parameters. An expression is derived for the resulting 
error moment matrix of the parameters. It is shown that thisl 
same moment matrix for a minimum variance estimate is obtained 
by using results derived by Cramér. The theory is illustrated by 
applying it to several radar measurement problems of interest. 


Theory and Application of the Separable Class of Random Processes. 
—A. H. Nuttall (in English). Res. Lab. of Electronics, Mass. Inst. 
Tech., Cambridge, Mass., Tech. Rept. No. 343; May 26, 1958.) { 


The separable class of random processes is defined as that class 
of random processes for which the g-function, 


Gy P= / (e, — p)plws, 22; 7) de, 


separates into the product of two functions, one a function only of «a, 
the other a function only of +. The second-order probability density 
function of the process is p(a1, v2; 7) and has pw as its mean. Various 
methods of determining whether a random process is separable are 
developed, and basic properties of the separable class are derived. 
It is proved that the separability of a random process that is 


sed through a nonlinear no-memory device is a necessary and 
cient condition for the input-output crosscovariance function 
e proportional to the input autocovariance function, whatever 
inear device is used. The uses of this invariance property are 
ted out. 

Fa nonlinear no-memory device is replaced by a linear memory- 
wble network, so as to minimize the mean-square difference 
een the two outputs for the same separable input process, 
ysis shows that the optimum linear network has no memory. 
ple relations among correlation functions for these circuits are 
» derived. 

ome results on Markov processes and best estimate procedure 
derived, important examples of separable processes are given, 
possible generalizations of separability are stated. 


liometer Techniques in Radar—R. Price (in English). (Lincoln 
»., Mass. Inst. Tech., Lexington, Mass., Group Rept. 34G-0003; 
e 10, 1960.) 


, gated radiometer is described that is appropriate to the detection 
adar echoes from a scintillating target whose dynamical behavior 
yell known, when the echoes are submerged in a strong back- 
ind of white noise. A possibly novel feature of the gated radio- 
er is that simply by adjusting its predetection filter it can be 
le to perform radar sweep integration that is purely predetection, 
ely postdetection, or a mixture of both. The close relationship 
he gated radiometer to sweep integration and conventional radio- 
ry is discussed. 

‘he output signal-to-noise ratio of the gated radiometer is de- 
-d and special cases of interest are examined. Range-frequency 
yiguity behavior is studied, and the advantages of coding the 
ysmission are mentioned. A proof is given that under certain 
ditions, which are probably quite realistic, the gated radiometer 
n optimum detector. 

‘nally, recent applications of gated radiometer techniques are 
sribed in part. Radar echoes from Venus, from the Sun, and from 
electrons in the ionosphere have been detected by such means, 
, in addition, new information has been gained about the moon. 


imum Codes Study—R. Turyn (in English). (Applied Res. Lab., 
vania Electronic Systems, Waltham, Mass., Final Rept.; January 
1960.) 


‘his report is concerned with finding sequences whose terms are 
or —1, or codes, and whose autocorrelation function is relatively 
ull (except at zero shift). The main interest is in certain optimal 


ook Reviews 


tistical Theory of Signal Detection—Carl W. Helstrom, Ph.D. 
rgamon Press, Inc., New York, N. Y.; 1960. 334 pages. $9.50.) 


‘his book is intended as an interdisciplinary text for the mathe- 
‘ician and radar or communications engineer, to introduce statis- 
1 decision theory and to apply it to problems in signal detection. 
fulfilling his intention “‘to convey toe each some feeling for the 
ject, so that each can exploit it for his own purposes,”’ the author 
produced a very readable volume. Of special merit are the 
ay discussions of the usefulness and meaning of assumptions and 
hods of solution, and the summary on page 330. The detail 
ded for completeness and rigor has naturally been omitted; these 
ssions are noted in the text and references are adequately given. 
‘he applications treated are detection of radar and digital com- 
nication signals in added Gaussian noise. Both white noise and 
e with known autocorrelation are considered, and the integral 
ations and solutions necessary to obtain orthonormal samples 
treated. 
‘he first two chapters fill in necessary background on signals, 
ar filters, and noise. Chapter three introduces decision theory 
its methods, covering all of the usual approaches including 
1ential analysis, though the remainder of the work concentrates 
fixed observation time decisions based on a Neyman-Pearson 
srion. Subsequent chapters deal with the completely known 


4 Book Reviews bf 


codes (autocorrelation function not greater than 1 in absolute value 
except at zero shift). The main results are that no such optimum 
codes exist if of odd length exceeding 13. For even length, certain 
periodic codes are considered and some important results are given. 
As a consequence, no optimum codes of even length less than 144 
but greater than 4 exist. It thus seems likely no optimum codes 
other than those known exist. In addition, certain near-optimum 
codes are discussed. 


Phase-Shift Keying in Fading Channels—H. B. Voelcker (in Eng- 
lish). (Proc. IHE, vol. 107, pt. B, pp. 31-38; January, 1960.) 


Phase-shift keying (psk) is discussed as a modulation technique 
for transmitting digital data over radio circuits subject to fading. 
The modest bandwidth requirements of psk modulation suggest 
that it can not only alleviate spectrum crowding, but can also 
transmit traffic with fewer errors. The theoretical results presented 
here indicate, however, that the random phase perturbations in- 
herent in fading radio signals cause unavoidable degradation in the 
performance of psk systems. Experimental data which partially 
support the theoretical results are cited, and comments relevant 
to the improvement of psk systems are included. 


A Theory of Signals—R. E. Wernikoff (in English). (Res. Lab. of 
Electronics, Mass. Inst. Tech., Cambridge, Mass., Tech. Rept. No. 
331; January 31, 1958.) 

An experiment is presented in which an attempt is made to arrive 
at a mathematical description of physical signals that embodies, 
more realistically than the usual functional representation, our 
limitations in performing measurements. The object is to achieve 
a closer relation between the structure of the mathematical descrip- 
tion and the finite-resolution properties of the detector that char- 
acterize any real measurement process. 

An algebra of signals is obtained, appropriate to the model in 
which essentially frequency-limited signals interact with linear, 
time-invariant systems, and observations are made by means of a 
linear, finite-resolution oscilloscope. The properties of this algebra 
are studied, and a metric that indicates which operators give physi- 
cally indistinguishable outputs is defined. The algebra is used to 
study problems in uniform and nonuniform sampling, the discrimi- 
nation of two events from one in noisy, radar-like systems, and the 
conditions under which a signal is indistinguishable from its short- 
time average. A general procedure for linear, least-peak-error pre- 
diction is obtained. In the limit as the detector resolution becomes 
perfect, the present model is shown to tend smoothly to the usual 
functional model. 


signal, and then progressively with ensembles with unknown ampli- 
tude, time of arrival, and pulse-to-pulse fluctuations. A good de- 
velopment in chapter five shows how minimax approaches and small 
signal approximations can lead to unsatisfactory design by concen- 
trating on weak and useless signals. The theory of estimation is 
introduced and discussed, and applied to the detection problems 
for unknown parameters where the assumption of least-favorable 
values is impractical. In addition there are chapters treating multiple 
signals and detection in clutter, and the detection of stochastic 
signals. 

In keeping with the intention to convey feeling for the subject, a 
discussion on the physical implication of the integral equation-ortho- 
normalization technique as ‘‘whitening,’ and the singular cases 
arising from it would have been a desirable addition. 

This book should be useful not only as an introduction, but also 
to compliment, on one hand, the more practical literature, and on 
the other hand, the more rigorous. It is recommended to both the 
graduate student and to the worker who is trying to understand the 
use of statistical theory in receiver design. 


Pror. T. G. BrrpsaLu 

Cooley Electronics Labs. 
University of Michigan Res. Inst. 
Ann Arbor, Mich. 
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